Rationally-designed meganucleases with altered sequence specificity and dna-binding affinity

ABSTRACT

Rationally-designed LAGLIDADG meganucleases and methods of making such meganucleases are provided. In addition, methods are provided for using the meganucleases to generate recombinant cells and organisms having a desired DNA sequence inserted into a limited number of loci within the genome, as well as methods of gene therapy, for treatment of pathogenic infections, and for in vitro applications in diagnostics and research.

RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.14/463,059 filed Aug. 19, 2014, which is a continuation of U.S. patentapplication Ser. No. 13/861,106 filed Apr. 11, 2013, now abandoned,which is a continuation of U.S. patent application Ser. No. 13/223,852filed Sep. 1, 2011, now abandoned, which is a continuation of U.S.patent application Ser. No. 11/583,368 filed Oct. 18, 2006, now U.S.Pat. No. 8,021,867, which claims benefit of priority to U.S. ProvisionalPatent Application No. 60/727,512, filed Oct. 18, 2005, the disclosuresof all of the foregoing of which are hereby incorporated by reference intheir entireties.

GOVERNMENT SUPPORT

The invention was supported in part by grants 2R01-GM-0498712,5F32-GM072322 and 5 DP1 OD000122 from the National Institute of GeneralMedical Services of National Institutes of Health of the United Statesof America. Therefore, the U.S. government may have certain rights inthe invention.

FIELD OF THE INVENTION

The invention relates to the field of molecular biology and recombinantnucleic acid technology. In particular, the invention relates torationally-designed, non-naturally-occurring meganucleases with alteredDNA recognition sequence specificity and/or altered affinity. Theinvention also relates to methods of producing such meganucleases, andmethods of producing recombinant nucleic acids and organisms using suchmeganucleases.

BACKGROUND OF THE INVENTION

Genome engineering requires the ability to insert, delete, substituteand otherwise manipulate specific genetic sequences within a genome, andhas numerous therapeutic and biotechnological applications. Thedevelopment of effective means for genome modification remains a majorgoal in gene therapy, agrotechnology, and synthetic biology (Porteus etal. (2005), Nat. Biotechnol. 23: 967-73; Tzfira et al. (2005), TrendsBiotechnol. 23: 567-9; McDaniel et al. (2005), Curr. Opin. Biotechnol.16: 476-83). A common method for inserting or modifying a DNA sequenceinvolves introducing a transgenic DNA sequence flanked by sequenceshomologous to the genomic target and selecting or screening for asuccessful homologous recombination event. Recombination with thetransgenic DNA occurs rarely but can be stimulated by a double-strandedbreak in the genomic DNA at the target site. Numerous methods have beenemployed to create DNA double-stranded breaks, including irradiation andchemical treatments. Although these methods efficiently stimulaterecombination, the double-stranded breaks are randomly dispersed in thegenome, which can be highly mutagenic and toxic. At present, theinability to target gene modifications to unique sites within achromosomal background is a major impediment to successful genomeengineering.

One approach to achieving this goal is stimulating homologousrecombination at a double-stranded break in a target locus using anuclease with specificity for a sequence that is sufficiently large tobe present at only a single site within the genome (see, e.g., Porteuset al. (2005), Nat. Biotechnol. 23: 967-73). The effectiveness of thisstrategy has been demonstrated in a variety of organisms using chimericfusions between an engineered zinc finger DNA-binding domain and thenon-specific nuclease domain of the Fokl restriction enzyme (Porteus(2006), Mol Ther 13: 438-46; Wright et al. (2005), Plant J. 44: 693-705;Urnov et al. (2005), Nature 435: 646-51). Although these artificial zincfinger nucleases stimulate site-specific recombination, they retainresidual non-specific cleavage activity resulting from under-regulationof the nuclease domain and frequently cleave at unintended sites (Smithet al. (2000), Nucleic Acids Res. 28: 3361-9). Such unintended cleavagecan cause mutations and toxicity in the treated organism (Porteus et al.(2005), Nat. Biotechnol. 23: 967-73).

A group of naturally-occurring nucleases which recognize 15-40 base-paircleavage sites commonly found in the genomes of plants and fungi mayprovide a less toxic genome engineering alternative. Such“meganucleases” or “homing endonucleases” are frequently associated withparasitic DNA elements, such as group 1 self-splicing introns andinteins. They naturally promote homologous recombination or geneinsertion at specific locations in the host genome by producing adouble-stranded break in the chromosome, which recruits the cellularDNA-repair machinery (Stoddard (2006), Q. Rev. Biophys. 38: 49-95).Meganucleases are commonly grouped into four families: the LAGLIDADGfamily, the GIY-YIG family, the His-Cys box family and the HNH family.These families are characterized by structural motifs, which affectcatalytic activity and recognition sequence. For instance, members ofthe LAGLIDADG family are characterized by having either one or twocopies of the conserved LAGLIDADG motif (see Chevalier et al. (2001),Nucleic Acids Res. 29(18): 3757-3774). The LAGLIDADG meganucleases witha single copy of the LAGLIDADG motif form homodimers, whereas memberswith two copies of the LAGLIDADG motif are found as monomers. Similarly,the GIY-YIG family members have a GIY-YIG module, which is 70-100residues long and includes four or five conserved sequence motifs withfour invariant residues, two of which are required for activity (see VanRoey et al. (2002), Nature Struct. Biol. 9: 806-811). The His-Cys boxmeganucleases are characterized by a highly conserved series ofhistidines and cysteines over a region encompassing several hundredamino acid residues (see Chevalier et al. (2001), Nucleic Acids Res.29(18): 3757-3774). In the case of the NHN family, the members aredefined by motifs containing two pairs of conserved histidinessurrounded by asparagine residues (see Chevalier et al. (2001), NucleicAcids Res. 29(18): 3757-3774). The four families of meganucleases arewidely separated from one another with respect to conserved structuralelements and, consequently, DNA recognition sequence specificity andcatalytic activity.

Natural meganucleases, primarily from the LAGLIDADG family, have beenused to effectively promote site-specific genome modification in plants,yeast, Drosophila, mammalian cells and mice, but this approach has beenlimited to the modification of either homologous genes that conserve themeganuclease recognition sequence (Monnat et al. (1999),Biochem.Biophys. Res. Commun. 255: 88-93) or to pre-engineered genomes intowhich a recognition sequence has been introduced (Rouet et al. (1994),Mol. Cell. Biol. 14: 8096-106; Chilton et al. (2003), Plant Physiol.133: 956-65; Puchta et al. (1996), Proc. Natl. Acad. Sci. USA 93:5055-60; Rong et al. (2002), Genes Dev. 16: 1568-81; Gouble et al.(2006), J. Gene Med. 8(5):616-622).

Systematic implementation of nuclease-stimulated gene modificationrequires the use of engineered enzymes with customized specificities totarget DNA breaks to existing sites in a genome and, therefore, therehas been great interest in adapting meganucleases to promote genemodifications at medically or biotechnologically relevant sites (Porteuset al. (2005), Nat. Biotechnol. 23: 967-73; Sussman et al. (2004), J.Mol. Biol. 342: 31-41; Epinat et al. (2003), Nucleic Acids Res. 31:2952-62).

The meganuclease I-CreI from Chlamydomonas reinhardtii is a member ofthe LAGLIDADG family which recognizes and cuts a 22 base-pairrecognition sequence in the chloroplast chromosome, and which presentsan attractive target for meganuclease redesign. The wild-type enzyme isa homodimer in which each monomer makes direct contacts with 9 basepairs in the full-length recognition sequence. Genetic selectiontechniques have been used to identify mutations in I-CreI that alterbase preference at a single position in this recognition sequence(Sussman et al. (2004), J. Mol. Biol. 342: 31-41; Chames et al. (2005),Nucleic Acids Res. 33: e178; Seligman et al. (2002), Nucleic Acids Res.30: 3870-9) or, more recently, at three positions in the recognitionsequence (Arnould et al. (2006), J. Mol. Biol. 355: 443-58). The I-CreIprotein-DNA interface contains nine amino acids that contact the DNAbases directly and at least an additional five positions that can formpotential contacts in modified interfaces. The size of this interfaceimposes a combinatorial complexity that is unlikely to be sampledadequately in sequence libraries constructed to select for enzymes withdrastically altered cleavage sites.

There remains a need for nucleases that will facilitate precisemodification of a genome. In addition, there remains a need fortechniques for generating nucleases with pre-determined,rationally-designed recognition sequences that will allow manipulationof genetic sequences at specific genetic loci and for techniquesutilizing such nucleases to genetically engineer organisms with precisesequence modifications.

SUMMARY OF THE INVENTION

The present invention is based, in part, upon the identification andcharacterization of specific amino acid residues in the LAGLIDADG familyof meganucleases that make contacts with DNA bases and the DNA backbonewhen the meganucleases associate with a double-stranded DNA recognitionsequence, and thereby affect the specificity and activity of theenzymes. This discovery has been used, as described in detail below, toidentify amino acid substitutions which can alter the recognitionsequence specificity and/or DNA-binding affinity of the meganucleases,and to rationally design and develop meganucleases that can recognize adesired DNA sequence that naturally-occurring meganucleases do notrecognize. The invention also provides methods that use suchmeganucleases to produce recombinant nucleic acids and organisms byutilizing the meganucleases to cause recombination of a desired geneticsequence at a limited number of loci within the genome of the organism,for gene therapy, for treatment of pathogenic infections, and for invitro applications in diagnostics and research.

Thus, in some embodiments, the invention provides recombinantmeganucleases having altered specificity for at least one recognitionsequence half-site relative to a wild-type I-CreI meganuclease, in whichthe meganuclease includes a polypeptide having at least 85% sequencesimilarity to residues 2-153 of the wild-type I-CreI meganuclease of SEQID NO: 1, but in which the recombinant meganuclease has specificity fora recognition sequence half-site which differs by at least one base pairfrom a half-site within an I-CreI meganuclease recognition sequenceselected from SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4 and SEQ ID NO: 5,and in which the recombinant meganuclease includes at least onemodification listed in Table 1 which is not an excluded modificationfound in the prior art.

In other embodiments, the invention provides recombinant meganucleaseshaving altered specificity for at least one recognition sequencehalf-site relative to a wild-type I-MsoI meganuclease, in which themeganuclease includes a polypeptide having at least 85% sequencesimilarity to residues 6-160 of the I-MsoI meganuclease of SEQ ID NO: 6,but in which the recombinant meganuclease has specificity for arecognition sequence half-site which differs by at least one base pairfrom a half-site within an I-MsoI meganuclease recognition sequenceselected from SEQ ID NO: 7 and SEQ ID NO: 8, and in which therecombinant meganuclease includes at least one modification listed inTable 2 which is not an excluded modification found in the prior art.

In other embodiments, the invention provides recombinant meganucleaseshaving altered specificity for a recognition sequence relative to awild-type I-SceI meganuclease, in which the meganuclease includes apolypeptide having at least 85% sequence similarity to residues 3-186 ofthe I-SceI meganuclease of SEQ ID NO: 9, but in which the recombinantmeganuclease has specificity for a recognition sequence which differs byat least one base pair from an I-SceI meganuclease recognition sequenceof SEQ ID NO: 10 and SEQ ID NO: 11, and in which the recombinantmeganuclease includes at least one modification listed in Table 3 whichis not an excluded modification found in the prior art.

In other embodiments, the invention provides recombinant meganucleaseshaving altered specificity for at least one recognition sequencehalf-site relative to a wild-type I-CeuI meganuclease, in which themeganuclease includes a polypeptide having at least 85% sequencesimilarity to residues 5-211 of the I-CeuI meganuclease of SEQ ID NO:12, but in which the recombinant meganuclease has specificity for arecognition sequence half-site which differs by at least one base pairfrom a half-site within an I-CeuI meganuclease recognition sequenceselected from SEQ ID NO: 13 and SEQ ID NO: 14, and in which therecombinant meganuclease includes at least one modification listed inTable 4 which is not an excluded modification found in the prior art.

The meganucleases of the invention can include one, two, three or moreof the modifications which have been disclosed herein in order to affectthe sequence specificity of the recombinant meganucleases at one, two,three or more positions within the recognition sequence. Themeganucleases can include only the novel modifications disclosed herein,or can include the novel modifications disclosed herein in combinationwith modifications found in the prior art. Specifically excluded,however, are recombinant meganucleases comprising only the modificationsof the prior art.

In another aspect, the invention provides for recombinant meganucleaseswith altered binding affinity for double-stranded DNA which is notsequence-specific. This is accomplished by modifications of themeganuclease residues which make contacts with the backbone of thedouble-stranded DNA recognition sequence. The modifications can increaseor decrease the binding affinity and, consequently, can increase ordecrease the overall activity of the enzyme. Moreover,increases/decreases in binding and activity have been found to causesdecreases/increases in sequence specificity. Thus, the inventionprovides a means for altering sequence specificity generally by alteringDNA-binding affinity.

Thus, in some embodiments, the invention provides for recombinantmeganucleases having altered binding affinity for double-stranded DNArelative to a wild-type I-CreI meganuclease, in which the meganucleaseincludes a polypeptide having at least 85% sequence similarity toresidues 2-153 of the I-CreI meganuclease of SEQ ID NO: 1, and in whichthe DNA-binding affinity has been either (1) increased by at least onemodification corresponding to a substitution selected from (a)substitution of E80, D137, 181, L112, P29, V64 or Y66 with H, N, Q, S,T, K or R, or (b) substitution of T46, T140 or T143 with K or R; or,conversely, (2) decreased by at least one modification corresponding toa substitution selected from (a) substitution of K34, K48, R51, K82,K116 or K139 with H, N, Q, S, T, D or E, or (b) substitution of I81,L112, P29, V64, Y66, T46, T140 or T143 with D or E.

In other embodiments, the invention provides for recombinantmeganucleases having altered binding affinity for double-stranded DNArelative to a wild-type I-MsoI meganuclease, in which the meganucleaseincludes a polypeptide having at least 85% sequence similarity toresidues 6-160 of the I-MsoI meganuclease of SEQ ID NO: 6, and in whichthe DNA-binding affinity has been either (1) increased by at least onemodification corresponding to a substitution selected from(a)substitution of E147, I85, G86 or Y118 with H, N, Q, S, T, K or R, or(b) substitution of Q41, N70, S87, T88, H89, Q122, Q139, S150 or N152with K or R; or, conversely, (2) decreased by at least one modificationcorresponding to a substitution selected from (a) substitution of K36,R51, K123, K143 or R144 with H, N, Q, S, T, D or E, or (b) substitutionof 185, G86, Y118, Q41, N70, S87, T88, H89, Q122, Q139, S150 or N152with D or E.

In other embodiments, the invention provides for recombinantmeganucleases having altered binding affinity for double-stranded DNArelative to a wild-type I-SceI meganuclease, in which the meganucleaseincludes a polypeptide having at least 85% sequence similarity toresidues 3-186 of the I-SceI meganuclease of SEQ ID NO: 9, and in whichthe DNA-binding affinity has been either (1) increased by at least onemodification corresponding to a substitution selected from (a)substitution of D201, L19, L80, L92, Y151, Y188, 1191, Y199 or Y222 withH, N, Q, S, T, K or R, or (b) substitution of N15, N17, S81, H84, N94,N120, T156, N157, S159, N163, Q165, S166, N194 or S202 with K or R; or,conversely, (2) decreased by at least one modification corresponding toa substitution selected from (a) substitution of K20, K23, K63, K122,K148, K153, K190, K193, K195 or K223 with H, N, Q, S, T, D or E, or (b)substitution of L19, L80, L92, Y151, Y188, I191, Y199, Y222, N15, N17,S81, H84, N94, N120, T156, N157, S159, N163, Q165, S166, N194 or S202with D or E.

In other embodiments, the invention provides for recombinantmeganucleases having altered binding affinity for double-stranded DNArelative to a wild-type I-CeuI meganuclease, in which the meganucleaseincludes a polypeptide having at least 85% sequence similarity toresidues 5-211 of the I-CeuI meganuclease of SEQ ID NO: 12, and in whichthe DNA-binding affinity has been either (1) increased by at least onemodification corresponding to a substitution selected from (a)substitution of D25 or D128 with H, N, Q, S, T, K or R, or (b)substitution of S68, N70, H94, S117, N120, N129 or H172 with K or R; or,conversely, (2) decreased by at least one modification corresponding toa substitution selected from (a) substitution of K21, K28, K31, R112,R114 or R130 with H, N, Q, S, T, D or E, or (b) substitution of S68,N70, H94, S117, N120, N129 or H172 with D or E.

The meganucleases of the invention can include one, two, three or moreof the modifications of backbone contact residues which have beendisclosed herein in order to affect DNA-binding affinity. In addition,these modifications affecting DNA-binding affinity can be combined withone or more of the novel modifications of the base contact residuesdescribed above which alter the sequence specificity of the recombinantmeganucleases at specific positions within the recognition sequence, orwith the prior art modifications described above, or with a combinationof the novel modifications and prior art modifications. In particular,by combining backbone contact modifications and base contactmodifications, recombinant meganucleases can be rationally-designed withdesired specificity and activity. For example, increases in DNA-bindingaffinity can be designed which may offset losses in affinity resultingfrom designed changes to base contact residues, or decreases in affinitycan be designed which may also decrease sequence specificity and broadenthe set of recognition sequences for an enzyme.

In another aspect, the invention provides for rationally-designedmeganuclease monomers with altered affinity for homo- or heterodimerformation. The affinity for dimer formation can be measured with thesame monomer (i.e., homodimer formation) or with a different monomer(i.e., heterodimer formation) such as a reference wild-typemeganuclease. These recombinant meganucleases have modifications to theamino acid residues which are present at the protein-protein interfacebetween monomers in a meganuclease dimer. The modifications can be usedto promote heterodimer formation and create meganucleases withnon-palindromic recognition sequences.

Thus, in some embodiments, the invention provides recombinantmeganuclease monomers having altered affinity for dimer formation with areference meganuclease monomer, in which the recombinant monomerincludes a polypeptide having at least 85% sequence similarity toresidues 2-153 of the I-CreI meganuclease of SEQ ID NO: 1, but in whichaffinity for dimer formation has been altered by at least onemodification corresponding to a substitution selected from (a)substitution of K7, K57 or K96 with D or E, or (b) substitution of E8 orE61 with K or R. Based upon such recombinant monomers, the inventionalso provides recombinant meganuclease heterodimers including (1) afirst polypeptide having at least 85% sequence similarity to residues2-153 of the I-CreI meganuclease of SEQ ID NO: 1, but in which affinityfor dimer formation has been altered by at least one modificationcorresponding to a substitution selected from (a) substitution of K7,K57 or K96 with D or E, and (2) a second polypeptide having at least 85%sequence similarity to residues 2-153 of the I-CreI meganuclease of SEQID NO: 1, but i which affinity for dimer formation has been altered byat least one modification corresponding to a substitution selected from(b) substitution of E8 or E61 with K or R.

In other embodiments, the invention provides recombinant meganucleasemonomers having altered affinity for dimer formation with a referencemeganuclease monomer, in which the recombinant monomer includes apolypeptide having at least 85% sequence similarity to residues 6-160 ofthe I-MsoI meganuclease of SEQ ID NO: 6, but in which affinity for dimerformation has been altered by at least one modification corresponding toa substitution selected from (a) substitution of R302 with D or E, or(b) substitution of D20, E11 or Q64 with K or R. Based upon suchrecombinant monomers, the invention also provides recombinantmeganuclease heterodimers including (1) a first polypeptide having atleast 85% sequence similarity to residues 6-160 of the I-MsoImeganuclease of SEQ ID NO: 6, but in which affinity for dimer formationhas been altered by at least one modification corresponding to asubstitution selected from (a) substitution of R302 with D or E, and (2)a second polypeptide having at least 85% sequence similarity to residues6-160 of the I-MsoI meganuclease of SEQ ID NO: 6, but in which affinityfor dimer formation has been altered by at least one modificationcorresponding to a substitution selected from (b) substitution of D20,E11 or Q64 with K or R.

In other embodiments, the invention provides recombinant meganucleasemonomers having altered affinity for dimer formation with a referencemeganuclease monomer, in which the recombinant monomer includes apolypeptide having at least 85% sequence similarity to residues 5-211 ofthe I-CeuI meganuclease of SEQ ID NO: 12, but in which affinity fordimer formation has been altered by at least one modificationcorresponding to a substitution selected from (a) substitution of R93with D or E, or (b) substitution of E152 with K or R. Based upon suchrecombinant monomers, the invention also provides recombinantmeganuclease heterodimers including (1) a first polypeptide having atleast 85% sequence similarity to residues 5-211 of the I-CeuImeganuclease of SEQ ID NO: 12, but in which affinity for dimer formationhas been altered by at least one modification corresponding to asubstitution selected from (a) substitution of R93 with D or E, and (2)a second polypeptide having at least 85% sequence similarity to residues5-211 of the I-CeuI meganuclease of SEQ ID NO: 12, but in which affinityfor dimer formation has been altered by at least one modificationcorresponding to a substitution selected from (b) substitution of E152with K or R.

The recombinant meganuclease monomers or heterodimers with alteredaffinity for dimer formation can also include one, two, three or more ofthe modifications of base contact residues described above; one, two,three or more of the modifications of backbone contact residuesdescribed above; or combinations of both. Thus, for example, the basecontacts of a monomer can be modified to alter sequence specificity, thebackbone contacts of a monomer can be modified to alter DNA-bindingaffinity, and the protein-protein interface can be modified to affectdimer formation. Such a recombinant monomer can be combined with asimilarly modified monomer to produce a rationally-designed meganucleaseheterodimer with desired sequence specificity and activity.

In another aspect, the invention provides for various methods of use forthe rationally-designed meganucleases described and enabled herein.These methods include producing genetically-modified cells andorganisms, treating diseases by gene therapy, treating pathogeninfections, and using the recombinant meganucleases for in vitroapplications for diagnostics and research.

Thus, in one aspect, the invention provides methods for producing agenetically-modified eukaryotic cell including an exogenous sequence ofinterest inserted in a chromosome, by transfecting the cell with (i) afirst nucleic acid sequence encoding a meganuclease of the invention,and (ii) a second nucleic acid sequence including said sequence ofinterest, wherein the meganuclease produces a cleavage site in thechromosome and the sequence of interest is inserted into the chromosomeat the cleavage site either by homologous recombination ornon-homologous end-joining.

Alternatively, in another aspect, the invention provides methods forproducing a genetically-modified eukaryotic cell including an exogenoussequence of interest inserted in a chromosome, by introducing ameganuclease protein of the invention into the cell, and transfectingthe cell with a nucleic acid including the sequence of interest, whereinthe meganuclease produces a cleavage site in the chromosome and thesequence of interest is inserted into the chromosome at the cleavagesite either by homologous recombination or non-homologous end-joining.

In another aspect, the invention provides methods for producing agenetically-modified eukaryotic cell by disrupting a target sequence ina chromosome, by transfecting the cell with a nucleic acid encoding ameganuclease of the invention, wherein the meganuclease produces acleavage site in the chromosome and the target sequence is disrupted bynon-homologous end-joining at the cleavage site.

In another aspect, the invention provides methods of producing agenetically-modified organism by producing a genetically-modifiedeukaryotic cell according to the methods described above, and growingthe genetically-modified eukaryotic cell to produce thegenetically-modified organism. In these embodiments, the eukaryotic cellcan be selected from a gamete, a zygote, a blastocyst cell, an embryonicstem cell, and a protoplast cell.

In another aspect, the invention provides methods for treating a diseaseby gene therapy in a eukaryote, by transfecting at least one cell of theeukaryote with one or more nucleic acids including (i) a first nucleicacid sequence encoding a meganuclease of the invention, and (ii) asecond nucleic acid sequence including a sequence of interest, whereinthe meganuclease produces a cleavage site in the chromosome and thesequence of interest is inserted into the chromosome by homologousrecombination or non-homologous end-joining, and insertion of thesequence of interest provides gene therapy for the disease.

Alternatively, in another aspect, the invention provides methods fortreating a disease by gene therapy in a eukaryote, by introducing ameganuclease protein of the invention into at least one cell of theeukaryote, and transfecting the cell with a nucleic acid including asequence of interest, wherein the meganuclease produces a cleavage sitein the chromosome and the sequence of interest is inserted into thechromosome at the cleavage site by homologous recombination ornon-homologous end-joining, and insertion of the sequence of interestprovides gene therapy for the disease.

In another aspect, the invention provides methods for treating a diseaseby gene therapy in a eukaryote by disrupting a target sequence in achromosome of the eukaryotic, by transfecting at least one cell of theeukaryote with a nucleic acid encoding a meganuclease of the invention,wherein the meganuclease produces a cleavage site in the chromosome andthe target sequence is disrupted by non-homologous end-joining at thecleavage site, wherein disruption of the target sequence provides thegene therapy for the disease.

In another aspect, the invention provides methods for treating a viralor prokaryotic pathogen infection in a eukaryotic host by disrupting atarget sequence in a genome of the pathogen, by transfecting at leastone infected cell of the host with a nucleic acid encoding ameganuclease of the invention, wherein the meganuclease produces acleavage site in the genome and the target sequence is disrupted byeither (1) non-homologous end-joining at the cleavage site or (2) byhomologous recombination with a second nucleic acid, and whereindisruption of the target sequence provides treatment for the infection.

More generally, in another aspect, the invention provides methods forrationally-designing recombinant meganucleases having alteredspecificity for at least one base position of a recognition sequence, by(1) determining at least a portion of a three-dimensional structure of areference meganuclease-DNA complex; (2) identifying amino acid residuesforming a base contact surface at the base position; (3) determining adistance between a β-carbon of at least a first residue of the contactsurface and at least a first base at the base position; and (4)identifying an amino acid substitution to promote the desired change byeither (a) for a first residue which is <6 Å from the first base,selecting a substitution from Group 1 and/or Group 2 which is a memberof an appropriate one of Group G, Group C, Group T or Group A; or (b)for a first residue which is >6 Å from said first base, selecting asubstitution from Group 2 and/or Group 3 which is a member of anappropriate one of Group G, Group C, Group T or Group A, where each ofthe Groups is defined herein. This method may be repeated for additionalcontact residues for the same base, and for contact residues for theother base at the same position, as well as for additional positions.

In addition, in another general aspect, the invention provides methodsfor rationally-designing a recombinant meganuclease having increasedDNA-binding affinity, by (1) determining at least a portion of athree-dimensional structure of a reference meganuclease-DNA complex; (2)identifying amino acid contact residues forming a backbone contactsurface; and (3) identifying an amino acid substitution to increase theDNA-binding affinity by (a) for a contact residue having anegatively-charged or hydrophobic side chain, selecting a substitutionhaving an uncharged/polar or positively-charged side chain; or (b) for acontact residue having an uncharged/polar side chain, selecting asubstitution having a positively-charged side chain. Conversely, theinvention also provides methods for rationally-designing a recombinantmeganuclease having decreased DNA-binding affinity, by (1) determiningat least a portion of a three-dimensional structure of a referencemeganuclease-DNA complex; (2) identifying amino acid contact residuesforming a backbone contact surface; (3) identifying an amino acidsubstitution to decrease the DNA-binding affinity by (a) for a contactresidue having a positively-charged side chain, selecting a substitutionhaving an uncharged/polar or negatively-charged side chain; or (b) for acontact residue having an hydrophobic or uncharged/polar side chain,selecting a substitution having a negatively-charged side chain.

These and other aspects and embodiments of the invention will beapparent to one of ordinary skill in the art based upon the followingdetailed description of the invention.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1(A) illustrates the interactions between the I-CreI homodimer andits naturally-occurring double-stranded recognition sequence, based uponcrystallographic data. This schematic representation depicts therecognition sequence (SEQ ID NO: 2 and SEQ ID NO: 3), shown as unwoundfor illustration purposes only, bound by the homodimer, shown as twoovals. The bases of each DNA half-site are numbered −1 through −9, andthe amino acid residues of I-CreI which form the recognition surface areindicated by one-letter amino acid designations and numbers indicatingresidue position. Solid black lines: hydrogen bonds to DNA bases. Dashedlines: amino acid positions that form additional contacts in enzymedesigns but do not contact the DNA in the wild-type complex. Arrows:residues that interact with the DNA backbone and influence cleavageactivity.

FIG. 1(B) illustrates the wild-type contacts between the A-T base pairat position −4 of the cleavage half-site on the right side of FIG. 1(A).Specifically, the residue Q26 is shown to interact with the A base.Residue 177 is in proximity to the base pair but not specificallyinteracting.

FIG. 1(C) illustrates the interactions between a rationally-designedvariant of the I-CreI meganuclease in which residue 177 has beenmodified to E77. As a result of this change, a G-C base pair ispreferred at position −4. The interaction between Q26 and the G base ismediated by a water molecule, as has been observed crystallographicallyfor the cleavage half-site on the left side of FIG. 1(A).

FIG. 1(D) illustrates the interactions between a rationally-designedvariant of the I-CreI meganuclease in which residue Q26 has beenmodified to E26 and residue 177 has been modified to R77. As a result ofthis change, a C-G base pair is preferred at position −4.

FIG. 1(E) illustrates the interactions between a rationally-designedvariant of the I-CreI meganuclease in which residue Q26 has beenmodified to A26 and residue 177 has been modified to Q77. As a result ofthis change, a T-A base pair is preferred at position −4.

FIG. 2(A) shows a comparison of one recognition sequence for each of thewild type I-CreI meganuclease (WT) and 11 rationally-designedmeganuclease heterodimers of the invention. Bases that are conservedrelative to the WT recognition sequence are shaded. The 9 bp half-sitesare bolded. WT: wild-type (SEQ ID NO: 4); CF: ΔF508 allele of the humanCFTR gene responsible for most cases of cystic fibrosis (SEQ ID NO: 25);MYD: the human DM kinase gene associated with myotonic dystrophy (SEQ IDNO: 27); CCR: the human CCR5 gene (a major HIV co-receptor) (SEQ ID NO:26); ACH: the human FGFR3 gene correlated with achondroplasia (SEQ IDNO: 23); TAT: the HIV-1 TAT/REV gene (SEQ ID NO: 15); HSV: the HSV-1UL36 gene (SEQ ID NO: 28); LAM: the bacteriophage λ p05 gene (SEQ ID NO:22); PDX: the Variola (smallpox) virus gp009 gene (SEQ ID NO: 30); URA:the Saccharomyces cerevisiae URA3 gene (SEQ ID NO: 36); GLA: theArabidopsis thaliana GL2 gene (SEQ ID NO: 32); BRP: the Arabidopsisthaliana BP-1 gene (SEQ ID NO: 33).

FIG. 2(B) illustrates the results of incubation of each of wild-typeI-CreI (WT) and 11 rationally-designed meganuclease heterodimers withplasmids harboring the recognition sites for all 12 enzymes for 6 hoursat 37° C. Percent cleavage is indicated in each box.

FIG. 3 illustrates cleavage patterns of wild-type andrationally-designed I-CreI homodimers. (A) wild type I-CreI. (B) I-CreIK116D. (C-L) rationally-designed meganucleases of the invention. Enzymeswere incubated with a set of plasmids harboring palindromes of theintended cleavage half-site the 27 corresponding single-base pairvariations. Bar graphs show fractional cleavage (F) in 4 hours at 37° C.Black bars: expected cleavage patterns based on Table 1. Gray bars: DNAsites that deviate from expected cleavage patterns. White circlesindicate bases in the intended recognition site. Also shown are cleavagetime-courses over two hours. The open circle time-course plots in C andL correspond to cleavage by the CCR1 and BRP2 enzymes lacking the E80Qmutation. The cleavage sites correspond to the 5′ (left column) and 3′(right column) half-sites for the heterodimeric enzymes described inFIG. 2(A).

DETAILED DESCRIPTION OF THE INVENTION 1.1 Introduction

The present invention is based, in part, upon the identification andcharacterization of specific amino acids in the LAGLIDADG family ofmeganucleases that make specific contacts with DNA bases andnon-specific contacts with the DNA backbone when the meganucleasesassociate with a double-stranded DNA recognition sequence, and whichthereby affect the recognition sequence specificity and DNA-bindingaffinity of the enzymes. This discovery has been used, as described indetail below, to identify amino acid substitutions in the meganucleasesthat can alter the specificity and/or affinity of the enzymes, and torationally design and develop meganucleases that can recognize a desiredDNA sequence that naturally-occurring meganucleases do not recognize,and/or that have increased or decreased specificity and/or affinityrelative to the naturally-occurring meganucleases. Furthermore, becauseDNA-binding affinity affects enzyme activity as well assequence-specificity, the invention provides rationally-designedmeganucleases with altered activity relative to naturally-occurringmeganucleases. In addition, the invention provides rationally-designedmeganucleases in which residues at the interface between the monomersassociated to form a dimer have been modified in order to promoteheterodimer formation. Finally, the invention provides uses for therationally-designed meganucleases in the production of recombinant cellsand organisms, as well as in gene therapy, anti-pathogen, anti-cancer,and in vitro applications, as disclosed herein.

As a general matter, the invention provides methods for generatingrationally-designed LAGLIDADG meganucleases containing altered aminoacid residues at sites within the meganuclease that are responsible for(1) sequence-specific binding to individual bases in the double-strandedDNA recognition sequence, or (2) non-sequence-specific binding to thephosphodiester backbone of a double-stranded DNA molecule. Becauseenzyme activity is correlated to DNA-binding affinity, however, alteringthe amino acids involved in binding to the DNA recognition sequence canalter not only the specificity of the meganuclease through specific basepair interactions, but also the activity of the meganuclease byincreasing or decreasing overall binding affinity for thedouble-stranded DNA. Similarly, altering the amino acids involved inbinding to the DNA backbone can alter not only the activity of theenzyme, but also the degree of specificity or degeneracy of binding tothe recognition sequence by increasing or decreasing overall bindingaffinity for the double-stranded DNA.

As described in detail below, the methods of rationally-designingmeganucleases include the identification of the amino acids responsiblefor DNA recognition/binding, and the application of a series of rulesfor selecting appropriate amino acid changes. With respect tomeganuclease sequence specificity, the rules include both stericconsiderations relating to the distances in a meganuclease-DNA complexbetween the amino acid side chains of the meganuclease and the bases inthe sense and anti-sense strands of the DNA, and considerations relatingto the non-covalent chemical interactions between functional groups ofthe amino acid side chains and the desired DNA base at the relevantposition.

Finally, a majority of natural meganucleases that bind DNA as homodimersrecognize pseudo- or completely palindromic recognition sequences.Because lengthy palindromes are expected to be rare, the likelihood ofencountering a palindromic sequence at a genomic site of interest isexceedingly low. Consequently, if these enzymes are to be redesigned torecognize genomic sites of interest, it is necessary to design twoenzyme monomers recognizing different half-sites that can heterodimerizeto cleave the non-palindromic hybrid recognition sequence. Therefore, insome aspects, the invention provides rationally-designed meganucleasesin which monomers differing by at least one amino acid position aredimerized to form heterodimers. In some cases, both monomers arerationally-designed to form a heterodimer which recognizes anon-palindromic recognition sequence. A mixture of two differentmonomers can result in up to three active forms of meganuclease dimer:the two homodimers and the heterodimer. In addition or alternatively, insome cases, amino acid residues are altered at the interfaces at whichmonomers can interact to form dimers, in order to increase or decreasethe likelihood of formation of homodimers or heterodimers.

Thus, in one aspect, the invention provide methods for rationallydesigning LAGLIDADG meganucleases containing amino acid changes thatalter the specificity and/or activity of the enzymes. In another aspect,the invention provides the rationally-designed meganucleases resultingfrom these methods. In another aspect, the invention provides methodsthat use such rationally-designed meganucleases to produce recombinantnucleic acids and organisms in which a desired DNA sequence or geneticlocus within the genome of an organism is modified by the insertion,deletion, substitution or other manipulation of DNA sequences. Inanother aspect, the invention provides methods for reducing the survivalof pathogens or cancer cells using rationally-designed meganucleaseswhich have pathogen-specific or cancer-specific recognition sequences.

1.2 References and Definitions

The patent and scientific literature referred to herein establishesknowledge that is available to those of skill in the art. The issuedU.S. patents, allowed applications, published foreign applications, andreferences, including GenBank database sequences, that are cited hereinare hereby incorporated by reference to the same extent as if each wasspecifically and individually indicated to be incorporated by reference.

As used herein, the term “meganuclease” refers to an endonuclease thatbinds double-stranded DNA at a recognition sequence that is greater than12 base pairs. Naturally-occurring meganucleases can be monomeric (e.g.,I-SceI) or dimeric (e.g., I-CreI). The term meganuclease, as usedherein, can be used to refer to monomeric meganucleases, dimericmeganucleases, or to the monomers which associate to form a dimericmeganuclease. The term “homing endonuclease” is synonymous with the term“meganuclease.”

As used herein, the term “LAGLIDADG meganuclease” refers either tomeganucleases including a single LAGLIDADG motif, which are naturallydimeric, or to meganucleases including two LAGLIDADG motifs, which arenaturally monomeric. The term “mono-LAGLIDADG meganuclease” is usedherein to refer to meganucleases including a single LAGLIDADG motif, andthe term “di-LAGLIDADG meganuclease” is used herein to refer tomeganucleases including two LAGLIDADG motifs, when it is necessary todistinguish between the two. Each of the two structural domains of adi-LAGLIDADG meganuclease which includes a LAGLIDADG motif can bereferred to as a LAGLIDADG subunit.

As used herein, the term “rationally-designed” means non-naturallyoccurring and/or genetically engineered. The rationally-designedmeganucleases of the invention differ from wild-type ornaturally-occurring meganucleases in their amino acid sequence orprimary structure, and may also differ in their secondary, tertiary orquaternary structure. In addition, the rationally-designed meganucleasesof the invention also differ from wild-type or naturally-occurringmeganucleases in recognition sequence-specificity and/or activity.

As used herein, with respect to a protein, the term “recombinant” meanshaving an altered amino acid sequence as a result of the application ofgenetic engineering techniques to nucleic acids which encode theprotein, and cells or organisms which express the protein. With respectto a nucleic acid, the term “recombinant” means having an alterednucleic acid sequence as a result of the application of geneticengineering techniques. Genetic engineering techniques include, but arenot limited to, PCR and DNA cloning technologies; transfection,transformation and other gene transfer technologies; homologousrecombination; site-directed mutagenesis; and gene fusion. In accordancewith this definition, a protein having an amino acid sequence identicalto a naturally-occurring protein, but produced by cloning and expressionin a heterologous host, is not considered recombinant.

As used herein with respect to recombinant proteins, the term“modification” means any insertion, deletion or substitution of an aminoacid residue in the recombinant sequence relative to a referencesequence (e.g., a wild-type).

As used herein, the term “genetically-modified” refers to a cell ororganism in which, or in an ancestor of which, a genomic DNA sequencehas been deliberately modified by recombinant technology. As usedherein, the term “genetically-modified” encompasses the term“transgenic.”

As used herein, the term “wild-type” refers to any naturally-occurringform of a meganuclease. The term “wild-type” is not intended to mean themost common allelic variant of the enzyme in nature but, rather, anyallelic variant found in nature. Wild-type meganucleases aredistinguished from recombinant or non-naturally-occurring meganucleases.

As used herein, the term “recognition sequence half-site” or simply“half site” means a nucleic acid sequence in a double-stranded DNAmolecule which is recognized by a monomer of a mono-LAGLIDADGmeganuclease or by one LAGLIDADG subunit of a di-LAGLIDADG meganuclease.

As used herein, the term “recognition sequence” refers to a pair ofhalf-sites which is bound and cleaved by either a mono-LAGLIDADGmeganuclease dimer or a di-LAGLIDADG meganuclease monomer. The twohalf-sites may or may not be separated by base pairs that are notspecifically recognized by the enzyme. In the cases of I-CreI, I-MsoIand I-CeuI, the recognition sequence half-site of each monomer spans 9base pairs, and the two half-sites are separated by four base pairswhich are not recognized specifically but which constitute the actualcleavage site (which has a 4 base pair overhang). Thus, the combinedrecognition sequences of the I-CreI, I-MsoI and I-CeuI meganucleasedimers normally span 22 base pairs, including two 9 base pair half-sitesflanking a 4 base pair cleavage site. The base pairs of each half-siteare designated −9 through −1, with the −9 position being most distalfrom the cleavage site and the −1 position being adjacent to the 4central base pairs, which are designated N₁-N₄. The strand of eachhalf-site which is oriented 5′ to 3′ in the direction from −9 to −1(i.e., towards the cleavage site), is designated the “sense” strand andthe opposite strand is designated the “antisense strand”, althoughneither strand may encode protein. Thus, the “sense” strand of onehalf-site is the antisense strand of the other half-site. See, forexample, FIG. 1(A). In the case of the I-SceI meganuclease, which is adi-LAGLIDADG meganuclease monomer, the recognition sequence is anapproximately 18 bp non-palindromic sequence, and there are no centralbase pairs which are not specifically recognized. By convention, one ofthe two strands is referred to as the “sense” strand and the other the“antisense” strand, although neither strand may encode protein.

As used herein, the term “specificity” means the ability of ameganuclease to recognize and cleave double-stranded DNA molecules onlyat a particular sequence of base pairs referred to as the recognitionsequence, or only at a particular set of recognition sequences. The setof recognition sequences will share certain conserved positions orsequence motifs, but may be degenerate at one or more positions. Ahighly-specific meganuclease is capable of cleaving only one or a veryfew recognition sequences. Specificity can be determined in a cleavageassay as described in Example 1. As used herein, a meganuclease has“altered” specificity if it binds to and cleaves a recognition sequencewhich is not bound to and cleaved by a reference meganuclease (e.g., awild-type) or if the rate of cleavage of a recognition sequence isincreased or decreased by a statistically significant (p <0.05) amountrelative to a reference meganuclease.

As used herein, the term “degeneracy” means the opposite of“specificity.” A highly-degenerate meganuclease is capable of cleaving alarge number of divergent recognition sequences. A meganuclease can havesequence degeneracy at a single position within a half-site or atmultiple, even all, positions within a half-site. Such sequencedegeneracy can result from (i) the inability of any amino acid in theDNA-binding domain of a meganuclease to make a specific contact with anybase at one or more positions in the recognition sequence, (ii) theability of one or more amino acids in the DNA-binding domain of ameganuclease to make specific contacts with more than one base at one ormore positions in the recognition sequence, and/or (iii) sufficientnon-specific DNA binding affinity for activity. A “completely”degenerate position can be occupied by any of the four bases and can bedesignated with an “N” in a half-site. A “partially” degenerate positioncan be occupied by two or three of the four bases (e.g., either purine(Pu), either pyrimidine (Py), or not G).

As used herein with respect to meganucleases, the term “DNA-bindingaffinity” or “binding affinity” means the tendency of a meganuclease tonon-covalently associate with a reference DNA molecule (e.g., arecognition sequence or an arbitrary sequence). Binding affinity ismeasured by a dissociation constant, K_(D) (e.g., the K_(D) of I-CreIfor the WT recognition sequence is approximately 0.1 nM). As usedherein, a meganuclease has “altered” binding affinity if the K_(D) ofthe recombinant meganuclease for a reference recognition sequence isincreased or decreased by a statistically significant (p<0.05) amountrelative to a reference meganuclease.

As used herein with respect to meganuclease monomers, the term “affinityfor dimer formation” means the tendency of a meganuclease monomer tonon-covalently associate with a reference meganuclease monomer. Theaffinity for dimer formation can be measured with the same monomer(i.e., homodimer formation) or with a different monomer (i.e.,heterodimer formation) such as a reference wild-type meganuclease.Binding affinity is measured by a dissociation constant, K_(D). As usedherein, a meganuclease has “altered” affinity for dimer formation if theK_(D) of the recombinant meganuclease monomer for a referencemeganuclease monomer is increased or decreased by a statisticallysignificant (p<0.05) amount relative to a reference meganucleasemonomer.

As used herein, the term “palindromic” refers to a recognition sequenceconsisting of inverted repeats of identical half-sites. In this case,however, the palindromic sequence need not be palindromic with respectto the four central base pairs, which are not contacted by the enzyme.In the case of dimeric meganucleases, palindromic DNA sequences arerecognized by homodimers in which the two monomers make contacts withidentical half-sites.

As used herein, the term “pseudo-palindromic” refers to a recognitionsequence consisting of inverted repeats of non-identical or imperfectlypalindromic half-sites. In this case, the pseudo-palindromic sequencenot only need not be palindromic with respect to the four central basepairs, but also can deviate from a palindromic sequence between the twohalf-sites. Pseudo-palindromic DNA sequences are typical of the naturalDNA sites recognized by wild-type homodimeric meganucleases in which twoidentical enzyme monomers make contacts with different half-sites.

As used herein, the term “non-palindromic” refers to a recognitionsequence composed of two unrelated half-sites of a meganuclease. In thiscase, the non-palindromic sequence need not be palindromic with respectto either the four central base pairs or the two monomer half-sites.Non-palindromic DNA sequences are recognized by either di-LAGLIDADGmeganucleases, highly degenerate mono-LAGLIDADG meganucleases (e.g.,I-CeuI) or by heterodimers of mono-LAGLIDADG meganuclease monomers thatrecognize non-identical half-sites.

As used herein, the term “activity” refers to the rate at which ameganuclease of the invention cleaves a particular recognition sequence.Such activity is a measurable enzymatic reaction, involving thehydrolysis of phosphodiester bonds of double-stranded DNA. The activityof a meganuclease acting on a particular DNA substrate is affected bythe affinity or avidity of the meganuclease for that particular DNAsubstrate which is, in turn, affected by both sequence-specific andnon-sequence-specific interactions with the DNA.

As used herein, the term “homologous recombination” refers to thenatural, cellular process in which a double-stranded DNA-break isrepaired using a homologous DNA sequence as the repair template (see,e.g. Cahill et al. (2006), Front. Biosci. 11:1958-1976). The homologousDNA sequence may be an endogenous chromosomal sequence or an exogenousnucleic acid that was delivered to the cell. Thus, in some embodiments,a rationally-designed meganuclease is used to cleave a recognitionsequence within a target sequence and an exogenous nucleic acid withhomology to or substantial sequence similarity with the target sequenceis delivered into the cell and used as a template for repair byhomologous recombination. The DNA sequence of the exogenous nucleicacid, which may differ significantly from the target sequence, isthereby incorporated into the chromosomal sequence. The process ofhomologous recombination occurs primarily in eukaryotic organisms. Theterm “homology” is used herein as equivalent to “sequence similarity”and is not intended to require identity by descent or phylogeneticrelatedness.

As used herein, the term “non-homologous end-joining” refers to thenatural, cellular process in which a double-stranded DNA-break isrepaired by the direct joining of two non-homologous DNA segments (see,e.g. Cahill et al. (2006), Front. Biosci. 11:1958-1976). DNA repair bynon-homologous end-joining is error-prone and frequently results in theuntemplated addition or deletion of DNA sequences at the site of repair.Thus, in certain embodiments, a rationally-designed meganuclease can beused to produce a double-stranded break at a meganuclease recognitionsequence within a target sequence to disrupt a gene (e.g., byintroducing base insertions, base deletions, or frameshift mutations) bynon-homologous end-joining. In other embodiments, an exogenous nucleicacid lacking homology to or substantial sequence similarity with thetarget sequence may be captured at the site of a meganuclease-stimulateddouble-stranded DNA break by non-homologous end-joining (see, e.g.Salomon, et al. (1998), EMBO J. 17:6086-6095). The process ofnon-homologous end-joining occurs in both eukaryotes and prokaryotessuch as bacteria.

As used herein, the term “sequence of interest” means any nucleic acidsequence, whether it codes for a protein, RNA, or regulatory element(e.g., an enhancer, silencer, or promoter sequence), that can beinserted into a genome or used to replace a genomic DNA sequence using ameganuclease protein. Sequences of interest can have heterologous DNAsequences that allow for tagging a protein or RNA that is expressed fromthe sequence of interest. For instance, a protein can be tagged withtags including, but not limited to, an epitope (e.g., c-myc, FLAG) orother ligand (e.g., poly-His). Furthermore, a sequence of interest canencode a fusion protein, according to techniques known in the art (see,e.g., Ausubel et al., Current Protocols in Molecular Biology, Wiley1999). In some embodiments, the sequence of interest is flanked by a DNAsequence that is recognized by the recombinant meganuclease forcleavage. Thus, the flanking sequences are cleaved allowing for properinsertion of the sequence of interest into genomic recognition sequencescleaved by the recombinant meganuclease. In some embodiments, the entiresequence of interest is homologous to or has substantial sequencesimilarity with the a target sequence in the genome such that homologousrecombination effectively replaces the target sequence with the sequenceof interest. In other embodiments, the sequence of interest is flankedby DNA sequences with homology to or substantial sequence similaritywith the target sequence such that homologous recombination inserts thesequence of interest within the genome at the locus of the targetsequence. In some embodiments, the sequence of interest is substantiallyidentical to the target sequence except for mutations or othermodifications in the meganuclease recognition sequence such that themeganuclease can not cleave the target sequence after it has beenmodified by the sequence of interest.

As used herein with respect to both amino acid sequences and nucleicacid sequences, the terms “percentage similarity” and “sequencesimilarity” refer to a measure of the degree of similarity of twosequences based upon an alignment of the sequences which maximizessimilarity between aligned amino acid residues or nucleotides, and whichis a function of the number of identical or similar residues ornucleotides, the number of total residues or nucleotides, and thepresence and length of gaps in the sequence alignment. A variety ofalgorithms and computer programs are available for determining sequencesimilarity using standard parameters. As used herein, sequencesimilarity is measured using the BLASTp program for amino acid sequencesand the BLASTn program for nucleic acid sequences, both of which areavailable through the National Center for Biotechnology Information(www.ncbi.nlm.nih.gov/), and are described in, for example, Altschul etal. (1990), J. Mol. Biol. 215:403 -410; Gish and States (1993), NatureGenet. 3:266-272; Madden et al. (1996), Meth. Enzymol. 266:131-141;Altschul et al. (1997), Nucleic Acids Res. 25:33 89-3402); Zhang et al.(2000), J. Comput. Biol. 7(1-2):203-14.. As used herein, percentsimilarity of two amino acid sequences is the score based upon thefollowing parameters for the BLASTp algorithm: word size=3; gap openingpenalty=−11; gap extension penalty=−1; and scoring matrix=BLOSUM62. Asused herein, percent similarity of two nucleic acid sequences is thescore based upon the following parameters for the BLASTn algorithm: wordsize=11; gap opening penalty=−5; gap extension penalty=−2; matchreward=1; and mismatch penalty=−3.

As used herein with respect to modifications of two proteins or aminoacid sequences, the term “corresponding to” is used to indicate that aspecified modification in the first protein is a substitution of thesame amino acid residue as in the modification in the second protein,and that the amino acid position of the modification in the firstproteins corresponds to or aligns with the amino acid position of themodification in the second protein when the two proteins are subjectedto standard sequence alignments (e.g., using the BLASTp program). Thus,the modification of residue “X” to amino acid “A” in the first proteinwill correspond to the modification of residue “Y” to amino acid “A” inthe second protein if residues X and Y correspond to each other in asequence alignment, and despite the fact that X and Y may be differentnumbers.

As used herein, the recitation of a numerical range for a variable isintended to convey that the invention may be practiced with the variableequal to any of the values within that range. Thus, for a variable whichis inherently discrete, the variable can be equal to any integer valuewithin the numerical range, including the end-points of the range.Similarly, for a variable which is inherently continuous, the variablecan be equal to any real value within the numerical range, including theend-points of the range. As an example, and without limitation, avariable which is described as having values between 0 and 2 can takethe values 0, 1 or 2 if the variable is inherently discrete, and cantake the values 0.0, 0.1, 0.01, 0.001, or any other real values >0 and 2if the variable is inherently continuous.

As used herein, unless specifically indicated otherwise, the word “or”is used in the inclusive sense of “and/or” and not the exclusive senseof “either/or.”

2.1 Rationally-Designed Meganucleases with Altered Sequence-Specificity

In one aspect of the invention, methods for rationally designingrecombinant LAGLIDADG family meganucleases are provided. In this aspect,recombinant meganucleases are rationally-designed by first predictingamino acid substitutions that can alter base preference at each positionin the half-site. These substitutions can be experimentally validatedindividually or in combinations to produce meganucleases with thedesired cleavage specificity.

In accordance with the invention, amino acid substitutions that cancause a desired change in base preference are predicted by determiningthe amino acid side chains of a reference meganuclease (e.g., awild-type meganuclease, or a non-naturally-occurring referencemeganuclease) that are able to participate in making contacts with thenucleic acid bases of the meganuclease's DNA recognition sequence andthe DNA phosphodiester backbone, and the spatial and chemical nature ofthose contacts. These amino acids include but are not limited to sidechains involved in contacting the reference DNA half-site. Generally,this determination requires having knowledge of the structure of thecomplex between the meganuclease and its double-stranded DNA recognitionsequence, or knowledge of the structure of a highly similar complex(e.g., between the same meganuclease and an alternative DNA recognitionsequence, or between an allelic or phylogenetic variant of themeganuclease and its DNA recognition sequence).

Three-dimensional structures, as described by atomic coordinates data,of a polypeptide or complex of two or more polypeptides can be obtainedin several ways. For example, protein structure determinations can bemade using techniques including, but not limited to, X-raycrystallography, NMR, and mass spectrometry. Another approach is toanalyze databases of existing structural co-ordinates for themeganuclease of interest or a related meganuclease. Such structural datais often available from databases in the form of three-dimensionalcoordinates. Often this data is accessible through online databases(e.g., the RCSB Protein Data Bank at www.rcsb.org/pdb).

Structural information can be obtained experimentally by analyzing thediffraction patterns of, for example, X-rays or electrons, created byregular two- or three-dimensional arrays (e.g., crystals) of proteins orprotein complexes. Computational methods are used to transform thediffraction data into three-dimensional atomic co-ordinates in space.For example, the field of X-ray crystallography has been used togenerate three-dimensional structural information on many protein-DNAcomplexes, including meganucleases (see, e.g., Chevalier et al. (2001),Nucleic Acids Res. 29(18): 3757-3774).

Nuclear Magnetic Resonance (NMR) also has been used to determineinter-atomic distances of molecules in solution. Multi-dimensional NMRmethods combined with computational methods have succeeded indetermining the atomic co-ordinates of polypeptides of increasing size(see, e.g., Tzakos et al. (2006), Annu. Rev. Biophys. Biomol. Struct.35:19-42.).

Alternatively, computational modeling can be used by applying algorithmsbased on the known primary structures and, when available, secondary,tertiary and/or quaternary structures of the protein/DNA, as well as theknown physiochemical nature of the amino acid side chains, nucleic acidbases, and bond interactions. Such methods can optionally includeiterative approaches, or experimentally-derived constraints. An exampleof such computational software is the CNS program described in Adams etal. (1999), Acta Crystallogr. D. Biol. Crystallogr. 55 (Pt 1): 181-90. Avariety of other computational programs have been developed that predictthe spatial arrangement of amino acids in a protein structure andpredict the interaction of the amino acid side chains of the proteinwith various target molecules (see, e.g., U.S. Pat. No. 6,988,041).

Thus, in some embodiments of the invention, computational models areused to identify specific amino acid residues that specifically interactwith DNA nucleic acid bases and/or facilitate non-specificphosphodiester backbone interactions. For instance, computer models ofthe totality of the potential meganuclease-DNA interaction can beproduced using a suitable software program, including, but not limitedto, MOLSCRIPT™ 2.0 (Avatar Software AB, Stockholm, Sweden), thegraphical display program O (Jones et. al. (1991), Acta Crystallography,A47: 110), the graphical display program GRASP™ (Nicholls et al. (1991),PROTEINS, Structure, Function and Genetics 11(4): 281ff), or thegraphical display program INSIGHT™ (TSI, Inc., Shoreview, Minn.).Computer hardware suitable for producing, viewing and manipulatingthree-dimensional structural representations of protein-DNA complexesare commercially available and well known in the art (e.g., SiliconGraphics Workstation, Silicon Graphics, Inc., Mountainview, Calif.).

Specifically, interactions between a meganuclease and itsdouble-stranded DNA recognition sequences can be resolved using methodsknown in the art. For example, a representation, or model, of the threedimensional structure of a multi-component complex structure, for whicha crystal has been produced, can be determined using techniques whichinclude molecular replacement or SIR/MIR (single/multiple isomorphousreplacement) (see, e.g., Brunger (1997), Meth. Enzym. 276: 558-580;Navaza and Saludjian (1997), Meth. Enzym. 276: 581-594; Tong andRossmann (1997), Meth. Enzym. 276: 594-611; and Bentley (1997), Meth.Enzym. 276: 611-619) and can be performed using a software program, suchas AMoRe/Mosflm (Navaza (1994), Acta Cryst. A50: 157-163; CCP4 (1994),Acta Cryst. D50: 760-763) or XPLOR (see, Brünger et al. (1992), X-PLORVersion 3.1. A System for X-ray Crystallography and NMR, Yale UniversityPress, New Haven, Conn.).

The determination of protein structure and potential meganuclease-DNAinteraction allows for rational choices concerning the amino acids thatcan be changed to affect enzyme activity and specificity. Decisions arebased on several factors regarding amino acid side chain interactionswith a particular base or DNA phosphodiester backbone. Chemicalinteractions used to determine appropriate amino acid substitutionsinclude, but are not limited to, van der Waals forces, steric hindrance,ionic bonding, hydrogen bonding, and hydrophobic interactions. Aminoacid substitutions can be selected which either favor or disfavorspecific interactions of the meganuclease with a particular base in apotential recognition sequence half-site in order to increase ordecrease specificity for that sequence and, to some degree, overallbinding affinity and activity. In addition, amino acid substitutions canbe selected which either increase or decrease binding affinity for thephosphodiester backbone of double-stranded DNA in order to increase ordecrease overall activity and, to some degree, to decrease or increasespecificity.

Thus, in specific embodiments, a three-dimensional structure of ameganuclease-DNA complex is determined and a “contact surface” isdefined for each base-pair in a DNA recognition sequence half-site. Insome embodiments, the contact surface comprises those amino acids in theenzyme with β-carbons less than 9.0 Å from a major groove hydrogen-bonddonor or acceptor on either base in the pair, and with side chainsoriented toward the DNA, irrespective of whether the residues make basecontacts in the wild-type meganuclease-DNA complex. In otherembodiments, residues can be excluded if the residues do not makecontact in the wild-type meganuclease-DNA complex, or residues can beincluded or excluded at the discretion of the designer to alter thenumber or identity of the residues considered. In one example, asdescribed below, for base positions −2, −7, −8, and −9 of the wild-typeI-CreI half-site, the contact surfaces were limited to the amino acidpositions that actually interact in the wild-type enzyme-DNA complex.For positions −1, −3, −4, −5, and −6, however, the contact surfaces weredefined to contain additional amino acid positions that are not involvedin wild-type contacts but which could potentially contact a base ifsubstituted with a different amino acid.

It should be noted that, although a recognition sequence half-site istypically represented with respect to only one strand of DNA,meganucleases bind in the major groove of double-stranded DNA, and makecontact with nucleic acid bases on both strands. In addition, thedesignations of “sense” and “antisense” strands are completely arbitrarywith respect to meganuclease binding and recognition. Sequencespecificity at a position can be achieved either through interactionswith one member of a base pair, or by a combination of interactions withboth members of a base-pair. Thus, for example, in order to favor thepresence of an A/T base pair at position X, where the A base is on the“sense” strand and the T base is on the “antisense” strand, residues areselected which are sufficiently close to contact the sense strand atposition X and which favor the presence of an A, and/or residues areselected which are sufficiently close to contact the antisense strand atposition X and which favor the presence of a T. In accordance with theinvention, a residue is considered sufficiently close if the β-carbon ofthe residue is within 9 Å of the closest atom of the relevant base.

Thus, for example, an amino acid with a β-carbon within 9 Å of the DNAsense strand but greater than 9 Å from the antisense strand isconsidered for potential interactions with only the sense strand.Similarly, an amino acid with a β-carbon within 9 Å of the DNA antisensestrand but greater than 9 Å from the sense strand is considered forpotential interactions with only the antisense strand. Amino acids withβ-carbons that are within 9 Å of both DNA strands are considered forpotential interactions with either strand.

For each contact surface, potential amino acid substitutions areselected based on their predicted ability to interact favorably with oneor more of the four DNA bases. The selection process is based upon twoprimary criteria: (i) the size of the amino acid side chains, which willaffect their steric interactions with different nucleic acid bases, and(ii) the chemical nature of the amino acid side chains, which willaffect their electrostatic and bonding interactions with the differentnucleic acid bases.

With respect to the size of side chains, amino acids with shorter and/orsmaller side chains can be selected if an amino acid β-carbon in acontact surface is <6 Å from a base, and amino acids with longer and/orlarger side chains can be selected if an amino acid β-carbon in acontact surface is >6 Å from a base. Amino acids with side chains thatare intermediate in size can be selected if an amino acid β-carbon in acontact surface is 5-8 Å from a base.

The amino acids with relatively shorter and smaller side chains can beassigned to Group 1, including glycine (G), alanine (A), serine (S),threonine (T), cysteine (C), valine (V), leucine (L), isoleucine (I),aspartate (D), asparagine (N) and proline (P). Proline, however, isexpected to be used less frequently because of its relativeinflexibility. In addition, glycine is expected to be used lessfrequently because it introduces unwanted flexibility in the peptidebackbone and its very small size reduces the likelihood of effectivecontacts when it replaces a larger residue. On the other hand, glycinecan be used in some instances for promoting a degenerate position. Theamino acids with side chains of relatively intermediate length and sizecan be assigned to Group 2, including lysine (K), methionine (M),arginine (R), glutamate (E) and glutamine (Q). The amino acids withrelatively longer and/or larger side chains can be assigned to Group 3,including lysine (K), methionine (M), arginine (R), histidine (H),phenylalanine (F), tyrosine (Y), and tryptophan (W). Tryptophan,however, is expected to be used less frequently because of its relativeinflexibility. In addition, the side chain flexibility of lysine,arginine, and methionine allow these amino acids to make base contactsfrom long or intermediate distances, warranting their inclusion in bothGroups 2 and 3. These groups are also shown in tabular form below:

Group 1 Group 2 Group 3 glycine (G) glutamine (Q) arginine (R) alanine(A) glutamate (E) histidine (H) serine (S) lysine (K) phenylalanine (F)threonine (T) methionine (M) tyrosine (Y) cysteine (C) arginine (R)tryptophan (W) valine (V) lysine (K) leucine (L) methionine (M)isoleucine (I) aspartate (D) asparagine (N) proline (P)

With respect to the chemical nature of the side chains, the differentamino acids are evaluated for their potential interactions with thedifferent nucleic acid bases (e.g., van der Waals forces, ionic bonding,hydrogen bonding, and hydrophobic interactions) and residues areselected which either favor or disfavor specific interactions of themeganuclease with a particular base at a particular position in thedouble-stranded DNA recognition sequence half-site. In some instances,it may be desired to create a half-site with one or more complete orpartial degenerate positions. In such cases, one may choose residueswhich favor the presence of two or more bases, or residues whichdisfavor one or more bases. For example, partial degenerate baserecognition can be achieved by sterically hindering a pyrimidine at asense or antisense position.

Recognition of guanine (G) bases is achieved using amino acids withbasic side chains that form hydrogen bonds to N7 and O6 of the base.Cytosine (C) specificity is conferred by negatively-charged side chainswhich interact unfavorably with the major groove electronegative groupspresent on all bases except C. Thymine (T) recognition isrationally-designed using hydrophobic and van der Waals interactionsbetween hydrophobic side chains and the major groove methyl group on thebase. Finally, adenine (A) bases are recognized using the carboxamideside chains Asn and Gln or the hydroxyl side chain of Tyr through a pairof hydrogen bonds to N7 and N6 of the base. Lastly, His can be used toconfer specificity for a purine base (A or G) by donating a hydrogenbond to N7. These straightforward rules for DNA recognition can beapplied to predict contact surfaces in which one or both of the bases ata particular base-pair position are recognized through arationally-designed contact.

Thus, based on their binding interactions with the different nucleicacid bases, and the bases which they favor at a position with which theymake contact, each amino acid residue can be assigned to one or moredifferent groups corresponding to the different bases they favor (i.e.,G, C, T or A). Thus, Group G includes arginine (R), lysine (K) andhistidine (H); Group C includes aspartate (D) and glutamate (E); Group Tincludes alanine (A), valine (V), leucine (L), isoleucine (I), cysteine(C), threonine (T), methionine (M) and phenylalanine (F); and Group Aincludes asparagine (N), glutamine (N), tyrosine (Y) and histidine (H).Note that histidine appears in both Group G and Group A; that serine (S)is not included in any group but may be used to favor a degenerateposition; and that proline, glycine, and tryptophan are not included inany particular group because of predominant steric considerations. Thesegroups are also shown in tabular form below:

Group G Group C Group T Group A arginine (R) aspartate (D) alanine (A)asparagine (N) lysine (K) glutamate (E) valine (V) glutamine (Q)histidine (H) leucine (L) tyrosine (Y) isoleucine (I) histidine (H)cysteine (C) threonine (T) methionine (M) phenylalanine (F)

Thus, in accordance with the invention, in order to effect a desiredchange in the recognition sequence half-site of a meganuclease at agiven position X, (1) determine at least the relevant portion of thethree-dimensional structure of the wild-type or referencemeganuclease-DNA complex and the amino acid residue side chains whichdefine the contact surface at position X; (2) determine the distancebetween the β-carbon of at least one residue comprising the contactsurface and at least one base of the base pair at position X; and (3)(a)for a residue which is <6 Å from the base, select a residue from Group 1and/or Group 2 which is a member of the appropriate one of Group G,Group C, Group T or Group A to promote the desired change, and/or (b)for a residue which is >6 Å from the base, select a residue from Group 2and/or Group 3 which is a member of the appropriate one of Group G,Group C, Group T or Group A to promote the desired change. More than onesuch residue comprising the contact surface can be selected for analysisand modification and, in some embodiments, each such residue is analyzedand multiple residues are modified. Similarly, the distance between theβ-carbon of a residue included in the contact surface and each of thetwo bases of the base pair at position X can be determined and, if theresidue is within 9 Å of both bases, then different substitutions can bemade to affect the two bases of the pair (e.g., a residue from Group 1to affect a proximal base on one strand, or a residue from Group 3 toaffect a distal base on the other strand). Alternatively, a combinationof residue substitutions capable of interacting with both bases in apair can affect the specificity (e.g., a residue from the T Groupcontacting the sense strand combined with a residue from the A Groupcontacting the antisense strand to select for T/A). Finally, multiplealternative modifications of the residues can be validated eitherempirically (e.g., by producing the recombinant meganuclease and testingits sequence recognition) or computationally (e.g., by computer modelingof the meganuclease-DNA complex of the modified enzyme) to chooseamongst alternatives.

Once one or more desired amino acid modifications of the wild-type orreference meganuclease are selected, the rationally-designedmeganuclease can be produced by recombinant methods and techniques wellknown in the art. In some embodiments, non-random or site-directedmutagenesis techniques are used to create specific sequencemodifications. Non-limiting examples of non-random mutagenesistechniques include overlapping primer PCR (see, e.g., Wang et al.(2006), Nucleic Acids Res. 34(2): 517-527), site-directed mutagenesis(see, e.g., U.S. Pat. No. 7,041,814), cassette mutagenesis (see, e.g., U.S. Pat. No. 7,041,814), and the manufacturer's protocol for the AlteredSites® II Mutagenesis Systems kit commercially available from PromegaBiosciences, Inc. (San Luis Obispo, Calif.).

The recognition and cleavage of a specific DNA sequence by arationally-designed meganuclease can be assayed by any method known byone skilled in the art (see, e.g., U.S. Pat. Pub. No. 2006/0078552). Incertain embodiments, the determination of meganuclease cleavage isdetermined by in vitro cleavage assays. Such assays use in vitrocleavage of a polynucleotide substrate comprising the intendedrecognition sequence of the assayed meganuclease and, in certainembodiments, variations of the intended recognition sequence in whichone or more bases in one or both half-sites have been changed to adifferent base. Typically, the polynucleotide substrate is adouble-stranded DNA molecule comprising a target site which has beensynthesized and cloned into a vector. The polynucleotide substrate canbe linear or circular, and typically comprises only one recognitionsequence. The meganuclease is incubated with the polynucleotidesubstrate under appropriate conditions, and the resultingpolynucleotides are analyzed by known methods for identifying cleavageproducts (e.g., electrophoresis or chromatography). If there is a singlerecognition sequence in a linear, double-strand DNA substrate, themeganuclease activity is detected by the appearance of two bands(products) and the disappearance of the initial full-length substrateband. In one embodiment, meganuclease activity can be assayed asdescribed in, for example, Wang et al. (1997), Nucleic Acid Res., 25:3767-3776.

In other embodiments, the cleavage pattern of the meganuclease isdetermined using in vivo cleavage assays (see, e.g., U .S. Pat. Pub. No.2006/0078552). In particular embodiments, the in vivo test is asingle-strand annealing recombination test (SSA). This kind of test isknown to those of skill in the art (Rudin et al. (1989), Genetics 122:519-534; Fishman-Lobell et al. (1992), Science 258: 480-4).

As will be apparent to one of skill in the art, additional amino acidsubstitutions, insertions or deletions can be made to domains of themeganuclease enzymes other than those involved in DNA recognition andbinding without complete loss of activity. Substitutions can beconservative substitutions of similar amino acid residues atstructurally or functionally constrained positions, or can benon-conservative substitutions at positions which are less structurallyor functionally constrained. Such substitutions, insertions anddeletions can be identified by one of ordinary skill in the art byroutine experimentation without undue effort. Thus, in some embodiments,the recombinant meganucleases of the invention include proteins havinganywhere from 85% to 99% sequence similarity (e.g., 85%, 87.5%, 90%,92.5%, 95%, 97.5%, 99%) to a reference meganuclease sequence. Withrespect to each of the wild-type I-CreI, I-MsoI, I-SceI and I-CeuIproteins, the most N-terminal and C-terminal sequences are not clearlyvisible in X-ray crystallography studies, suggesting that thesepositions are not structurally or functionally constrained. Therefore,these residues can be excluded from calculation of sequence similarity,and the following reference meganuclease sequences can be used: residues2-153 of SEQ ID NO: 1 for I-CreI, residues 6-160 of SEQ ID NO: 6 forI-MsoI, residues 3-186 of SEQ ID NO: 9 for I-SceI, and residues 5-211 ofSEQ ID NO: 12 for I-CeuI.

2.2 LAGLIDADG Family Meganucleases

The LAGLIDADG meganuclease family is composed of more than 200 membersfrom a diverse phylogenetic group of host organisms. All members of thisfamily have one or two copies of a highly conserved LAGLIDADG motifalong with other structural motifs involved in cleavage of specific DNAsequences. Enzymes that have a single copy of the LAGLIDADG motif (i.e.,mono-LAGLIDADG meganucleases) function as dimers, whereas the enzymesthat have two copies of this motif (i.e., di-LAGLIDADG meganucleases)function as monomers.

All LAGLIDADG family members recognize and cleave relatively longsequences (>12bp), leaving four nucleotide 3′ overhangs. These enzymesalso share a number of structural motifs in addition to the LAGLIDADGmotif, including a similar arrangement of anti-parallel β-strands at theprotein-DNA interface. Amino acids within these conserved structuralmotifs are responsible for interacting with the DNA bases to conferrecognition sequence specificity. . The overall structural similaritybetween some members of the family (e.g., I-CreI, I-MsoI, I-SceI andI-CeuI) has been elucidated by X-ray crystallography. Accordingly, themembers of this family can be modified at particular amino acids withinsuch structural motifs to change the over-all activity orsequence-specificity of the enzymes, and corresponding modifications canreasonable be expected to have similar results in other family members.See, generally, Chevalier et al. (2001), Nucleic Acid Res. 29(18):3757-3774).

2.2.1 Meganucleases Derived from I-CreI

In one aspect, the present invention relates to rationally-designedmeganucleases which are based upon or derived from the I-CreImeganuclease of Chlamydomonas reinhardtii. The wild-type amino acidsequence of the I-CreI meganuclease is shown in SEQ ID NO: 1, whichcorresponds to Genbank Accession # PO5725. Two recognition sequence halfsites of the wild-type I-CreI meganuclease from crystal structure PDB #1BP7 are shown below:

Position   -9-8-7-6-5-4-3-2-1 SEQ ID NO: 25′-G A A A C T G T C T C A C G A C G T T T T G-3′ SEQ ID NO: 33′-C T T T G A C A G A G T G C T G C A A A A C-5′ Position                            -1-2-3-4-5-6-7-8-9Note that this natural recognition sequence is not perfectlypalindromic, even outside the central four base pairs. The tworecognition sequence half-sites are shown in bold on their respectivesense strands.

Wild-type I-CreI also recognizes and cuts the following perfectlypalindromic (except for the central N1-N4 bases) sequence:

Position   -9-8-7-6-5-4-3-2-1 SEQ ID NO: 45′-C A A A C T G T C G T G A G A C A G T T T G-3′ SEQ ID NO: 53′-G T T T G A C A G C A C T C T G T C A A A C-5′ Position                            -1-2-3-4-5-6-7-8-9

The palindromic sequence of SEQ ID NO: 4 and SEQ ID NO: 5 is consideredto be a better substrate for the wild-type I-CreI because the enzymebinds this site with higher affinity and cleaves it more efficientlythan the natural DNA sequence. For the purposes of the followingdisclosure, and with particular regard to the experimental resultspresented herein, this palindromic sequence cleaved by wild-type I-CreIis referred to as “WT” (see, e.g., FIG. 2(A)). The two recognitionsequence half-sites are shown in bold on their respective sense strands.

FIG. 1(A) depicts the interactions of a wild-type I-CreI meganucleasehomodimer with a double-stranded DNA recognition sequence, FIGS. 1(B)shows the specific interactions between amino acid residues of theenzyme and bases at the −4 position of one half-site for a wild-typeenzyme and one wild-type recognition sequence, and FIGS. 1(C)-(E) showthe specific interactions between amino acid residues of the enzyme andbases at the −4 position of one half-site for three rationally-designedmeganucleases of the invention with altered specificity at position −4of the half-site.

Thus, the base preference at any specified base position of thehalf-site can be rationally altered to each of the other three basepairs using the methods disclosed herein. First, the wild typerecognition surface at the specified base position is determined (e.g.,by analyzing meganuclease-DNA complex co-crystal structures; or bycomputer modeling of the meganuclease-DNA complexes). Second, existingand potential contact residues are determined based on the distancesbetween the β-carbons of the surrounding amino acid positions and thenucleic acid bases on each DNA strand at the specified base position.For example, and without limitation, as shown in FIG. 1(A), the I-CreIwild type meganuclease-DNA contact residues at position −4 involve aglutamine at position 26 which hydrogen bonds to an A base on theantisense DNA strand. Residue 77 was also identified as potentiallybeing able to contact the −4 base on the DNA sense strand. The β-carbonof residue 26 is 5.9 Å away from N7 of the A base on the antisense DNAstrand, and the β-carbon of residue 77 is 7.15 Å away from the C5-methylof the T on the sense strand. According to the distance and basechemistry rules described herein, a C on the sense strand could hydrogenbond with a glutamic acid at position 77 and a G on the antisense strandcould bond with glutamine at position 26 (mediated by a water molecule,as observed in the wild-type I-CreI crystal structure) (see FIG. 1(C));a G on the sense strand could hydrogen bond with an arginine at position77 and a C on the antisense strand could hydrogen bond with a glutamicacid at position 26 (see FIG. 1(D)); an A on the sense strand couldhydrogen bond with a glutamine at position 77 and a T on the antisensestrand could form hydrophobic contacts with an alanine at position 26(see FIG. 1(E)). If the base specific contact is provided by position77, then the wild-type contact, Q26, can be substituted (e.g., with aserine residue) to reduce or remove its influence on specificity.Alternatively, complementary mutations at positions 26 and 77 can becombined to specify a particular base pair (e.g., A26 specifies a T onthe antisense strand and Q77 specifies an A on the sense strand (FIG.1(E)). These predicted residue substitutions have all been validatedexperimentally.

Thus, in accordance with the invention, a substantial number of aminoacid modifications to the DNA recognition domain of the I-CreImeganuclease have been identified which, singly or in combination,result in recombinant meganucleases with specificities altered atindividual bases within the DNA recognition sequence half-site, suchthat these rationally-designed meganucleases have half-sites differentfrom the wild-type enzyme. The amino acid modifications of I-CreI andthe resulting change in recognition sequence half-site specificity areshown in Table 1:

TABLE 1 Favored Sense-Strand Base Posn. A C G T A/T A/C A/G C/T G/TA/G/T A/C/G/T -1 Y75 R70* K70 Q70* T46* G70 L75* H75* E70* C70 A70 C75*R75* E75* L70 S70 Y139* H46* E46* Y75* G46* C46* K46* D46* Q75* A46*R46* H75* H139 Q46* H46* -2 Q70 E70 H70 Q44* C44* T44* D70 D44* A44*K44* E44* V44* R44* I44* L44* N44* -3 Q68 E68 R68 M68 H68 Y68 K68 C24*F68 C68 I24* K24* L68 R24* F68 -4 A26* E77 R77 S77 S26* Q77 K26* E26*Q26* -5 E42 R42 K28* C28* M66 Q42 K66 -6 Q40 E40 R40 C40 A40 S40 C28*R28* I40 A79 S28* V40 A28* C79 H28* I79 V79 Q28* -7 N30* E38 K38 I38 C38H38 Q38 K30* R38 L38 N38 R30* E30* Q30* -8 F33 E33 F33 L33 R32* R33 Y33D33 H33 V33 I33 F33 C33 -9 E32 R32 L32 D32 S32 K32 V32 I32 N32 A32 H32C32 Q32 T32Bold entries are wild-type contact residues and do not constitute“modifications” as used herein. An asterisk indicates that the residuecontacts the base on the antisense strand.2.2.2 Meganucleases Derived from I-MsoI

In another aspect, the present invention relates to rationally-designedmeganucleases which are based upon or derived from the I-MsoImeganuclease of Monomastix sp. The wild-type amino acid sequence of theI-MsoI meganuclease is shown in SEQ ID NO: 6, which corresponds toGenbank Accession # AAL34387. Two recognition sequence half-sites of thewild-type I-MsoI meganuclease from crystal structure PDB # 1M5X areshown below:

Position   -9-8-7-6-5-4-3-2-1 SEQ ID NO: 75′-C A G A A C G T C G T G A G A C A G T T C C-3′ SEQ ID NO: 83′-G T C T T G C A G C A C T C T G T C A A G G-5′ Position                            -1-2-3-4-5-6-7-8-9Note that the recognition sequence is not perfectly palindromic, evenoutside the central four base pairs. The two recognition sequencehalf-sites are shown in bold on their respective sense strands.

In accordance with the invention, a substantial number of amino acidmodifications to the DNA recognition domain of the I-MsoI meganucleasehave been identified which, singly or in combination, can result inrecombinant meganucleases with specificities altered at individual baseswithin the DNA recognition sequence half-sites, such that theserationally-designed meganucleases have recognition sequences differentfrom the wild-type enzyme. Amino acid modifications of I-MsoI and thepredicted change in recognition sequence half-site specificity are shownin Table 2:

TABLE 2 Favored Sense-Strand Base Position A C G T -1 K75* D77 K77 C77Q77 E77 R77 L77 A49* K49* E49* Q79* C49* R75* E79* K79* K75* R79* K79*-2 Q75 E75 K75 A75 K81 D75 E47* C75 C47* R47* E81* V75 I47* K47* I75L47* K81* T75 R81* Q47* Q81* -3 Q72 E72 R72 K72 C26* Y72 K72 Y72 L26*H26* Y26* H26* V26* K26* F26* A26* R26* I26* -4 K28 K28* R83 K28 Q83R28* K83 K83 E83 Q28* -5 K28 K28* R45 Q28* C28* R28* E28* L28* I28* -6I30* E43 R43 K43 V30* E85 K43 I85 S30* K30* K85 V85 L30* R30* R85 L85Q43 E30* Q30* D30* -7 Q41 E32 R32 K32 E41 R41 M41 K41 L41 I41 -8 Y35 E32R32 K32 K35 K32 K35 K35 R35 -9 N34 D34 K34 S34 H34 E34 R34 C34 S34 H34V34 T34 A34

-   -   Bold entries are represent wild-type contact residues and do not        constitute “modifications” as used herein.    -   An asterisk indicates that the residue contacts the base on the        antisense strand.        2.2.3 Meganucleases Derived from I-SceI

In another aspect, the present invention relates to rationally-designedmeganucleases which are based upon or derived from the I-SceImeganuclease of Saccharomyces cerevisiae. The wild-type amino acidsequence of the I-SceI meganuclease is shown in SEQ ID NO: 9, whichcorresponds to Genbank Accession # CAA09843. The recognition sequence ofthe wild-type I-SceI meganuclease from crystal structure PDB # 1R7M isshown below:

Sense SEQ ID NO: 10 5′-T T A C C C T G T T A T C C C T A G-3′ AntisenseSEQ ID NO: 11 3′-A A T G G G A C A  A  T  A  G  G  G  A  T  C-5′Position    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Note that the recognition sequence is non-palindromic and there are notfour base pairs separating half-sites.

In accordance with the invention, a substantial number of amino acidmodifications to the DNA recognition domain of the I-SceI meganucleasehave been identified which, singly or in combination, can result inrecombinant meganucleases with specificities altered at individual baseswithin the DNA recognition sequence, such that these rationally-designedmeganucleases have recognition sequences different from the wild-typeenzyme. The amino acid modifications of I-SceI and the predicted changein recognition sequence specificity are shown in Table 3:

TABLE 3 Favored Sense-Strand Base Position A C G T 4 K50 R50* E50* K57K50* R57 M57 E57 K57 Q50* 5 K48 R48* E48* Q48* Q102 K48* K102 C102 E102R102 L102 E59 V102 6 K59 R59* K84 Q59* K59* E59* Y46 7 C46* R46* K86 K68L46* K46* R86 C86 V46* E86 E46* L86 Q46* 8 K61* E88 E61* K88 S61* R61*R88 Q61* V61* H61* K88 H61* A61* L61* 9 T98* R98* E98* Q98* C98* K98*D98* V98* L98* 10 V96* K96* D96* Q96* C96* R96* E96* A96* 11 C90* K90*E90* Q90* L90* R90* 12 Q193 E165 K165 C165 E193 R165 L165 D193 C193 V193A193 T193 S193 13 C193* K193* E193* Q193* L193* R193* D193* C163 D192K163 L163 R192 14 L192* E161 K147 K161 C192* R192* K161 Q192* K192* R161R197 D192* E192* 15 E151 K151 C151 L151 K151 17 N152* K152* N152* Q152*S152* K150* S152* Q150* C150* D152* L150* D150* V150* E150* T150* 18K155* R155* E155* H155* C155* K155* Y155*

-   -   Bold entries are wild-type contact residues and do not        constitute “modifications” as used herein.    -   An asterisk indicates that the residue contacts the base on the        antisense strand.        2.2.4 Meganucleases Derived from I-CeuI

In another aspect, the present invention relates to rationally-designedmeganucleases which are based upon or derived from the I-CeuImeganuclease of Chlamydomonas eugametos. The wild-type amino acidsequence of the I-CeuI meganuclease is shown in SEQ ID NO: 12, whichcorresponds to Genbank Accession # P32761. Two recognition sequence halfsites of the wild-type I-CeuI meganuclease from crystal structure PDB #2EX5 are shown below:

Position   -9-8-7-6-5-4-3-2-1 SEQ ID NO: 135′-A T A A C G G T C C T A A G G T A G C G A A-3′ SEQ ID NO: 143′-T A T T G C C A G G A T T C C A T C G C T T-5′ Position                            -1-2-3-4-5-6-7-8-9Note that the recognition sequence is non-palindromic, even outside thecentral four base pairs, despite the fact that I-CeuI is a homodimer,due to the natural degeneracy in the I-CeuI recognition interface(Spiegel et al. (2006), Structure 14:869-80). The two recognitionsequence half-sites are shown in bold on their respective sense strands.

In accordance with the invention, a substantial number of amino acidmodifications to the DNA recognition domain of the I-CeuI meganucleasehave been identified which, singly or in combination, result inrecombinant meganucleases with specificities altered at individual baseswithin the DNA recognition sequence half-site, such that theserationally-designed meganucleases can have recognition sequencesdifferent from the wild-type enzyme. The amino acid modifications ofI-CeuI and the predicted change in recognition sequence specificity areshown in Table 4:

TABLE 4 Favored Sense-Strand Base Position A C G T -1 C92* K116* E116*Q116* A92* R116* E92* Q92* V92* D116* K92* -2 Q117 E117 K117 C117 C90*D117 R124 V117 L90* R174* K124 T117 V90* K124* E124* Q90* K90* E90* R90*D90* K68* -3 C70* K70* E70* Q70* V70* E88* T70* L70* K70* -4 Q126 E126R126 K126 N126 D126 K126 L126 K88* R88* E88* Q88* L88* K88* D88* C88*K72* C72* L72* V72* -5 C74* K74* E74* C128 L74* K128 L128 V74* R128 V128T74* E128 T128 -6 Q86 D86 K128 K86 E86 R128 C86 R84* R86 L86 K84* K86E84* -7 L76* R76* E76* H76* C76* K76* R84 Q76* K76* H76* -8 Y79 D79 R79C79 R79 E79 K79 L79 Q76 D76 K76 V79 E76 R76 L76 -9 Q78 D78 R78 K78 N78E78 K78 V78 H78 H78 L78 K78 C78 T78

-   -   Bold entries are wild-type contact residues and do not        constitute “modifications” as used herein. An asterisk indicates        that the residue contacts the base on the antisense strand.

2.2.5 Specifically-Excluded Recombinant Meganucleases

The present invention is not intended to embrace certain recombinantmeganucleases which have been described in the prior art, and which havebeen developed by alternative methods. These excluded meganucleasesinclude those described by Arnould et al. (2006), J. Mol. Biol. 355:443-58; Sussman et al. (2004), J. Mol. Biol. 342: 31-41; Chames et al.(2005), Nucleic Acids Res. 33: e178; Seligman et al. (2002), NucleicAcids Res. 30: 3870-9; and Ashworth et al. (2006), Nature441(7093):656-659; the entire disclosures of which are herebyincorporated by reference, including recombinant meganucleases based onI-CreI with single substitutions selected from C33, R33, A44, H33, K32,F33, R32, A28, A70, E33, V33, A26, and R66. Also excluded arerecombinant meganucleases based on I-CreI with three substitutionsselected from A68/N70/N75 and D44/D70N75, or with four substitutionsselected from K44/T68/G60/N75 and R44/A68/T70/N75. Lastly, specificallyexcluded is the recombinant meganuclease based on I-MsoI with the pairof substitutions L28 and R83. These substitutions or combinations ofsubstitutions are referred to herein as the “excluded modifications.”

2.2.6 Meganucleases with Multiple Changes in the Recognition SequenceHalf-Site

In another aspect, the present invention relates to rationally-designedmeganucleases which are produced by combining two or more amino acidmodifications as described in sections 2.2.1-2.2.4 above, in order toalter half-site preference at two or more positions in a DNA recognitionsequence half-site. For example, without limitation, and as more fullydescribed below, the enzyme DJ1 was derived from I-CreI by incorporatingthe modifications R30/E38 (which favor C at position −7), R40 (whichfavors G at position −6), R42 (which favors at G at position −5), andN32 (which favors complete degeneracy at position −9). Therationally-designed DJ1 meganuclease invariantly recognizes C-₇ G-₆ G-₅compared to the wild-type preference for A-₇ A-₆ C-₅, and has increasedtolerance for A at position −9.

The ability to combine residue substitutions that affect different basepositions is due in part to the modular nature of the LAGLIDADGmeganucleases. A majority of the base contacts in the LAGLIDADGrecognition interfaces are made by individual amino acid side chains,and the interface is relatively free of interconnectivity or hydrogenbonding networks between side chains that interact with adjacent bases.This generally allows manipulation of residues that interact with onebase position without affecting side chain interactions at adjacentbases. The additive nature of the mutations listed in sections2.2.1-2.2.4 above is also a direct result of the method used to identifythese mutations. The method predicts side chain substitutions thatinteract directly with a single base. Interconnectivity or hydrogenbonding networks between side chains is generally avoided to maintainthe independence of the substitutions within the recognition interface.

Certain combinations of side chain substitutions are completely orpartially incompatible with one another. When an incompatible pair orset of amino acids are incorporated into a rationally-designedmeganuclease, the resulting enzyme will have reduced or eliminatedcatalytic activity. Typically, these incompatibilities are due to stericinterference between the side chains of the introduced amino acids andactivity can be restored by identifying and removing this interference.Specifically, when two amino acids with large side chains (e.g., aminoacids from group 2 or 3) are incorporated at amino acid positions thatare adjacent to one another in the meganuclease structure (e.g.,positions 32 and 33, 28 and 40, 28 and 42, 42 and 77, or 68 and 77 inthe case of meganucleases derived from I-CreI), it is likely that thesetwo amino acids will interfere with one another and reduce enzymeactivity. This interference be eliminated by substituting one or bothincompatible amino acids to an amino acid with a smaller side chain(e.g., group 1 or group 2). For example, in rationally-designedmeganucleases derived from I-CreI, K28 interferes with both R40 and R42.To maximize enzyme activity, R40 and R42 can be combined with a serineor aspartic acid at position 28.

Combinations of amino substitutions, identified as described herein, canbe used to rationally alter the specificity of a wild-type meganuclease(or a previously modified meganuclease) from an original recognitionsequence to a desired recognition sequence which may be present in anucleic acid of interest (e.g., a genome). FIG. 2A, for example, showsthe “sense” strand of the I-CreI meganuclease recognition sequence WT(SEQ ID NO: 4) as well as a number of other sequences for which arationally-designed meganuclease would be useful. Conserved basesbetween the WT recognition sequence and the desired recognition sequenceare shaded. In accordance with the invention, recombinant meganucleasesbased on the I-CreI meganuclease can be rationally-designed for each ofthese desired recognition sequences, as well as any others, by suitableamino acid substitutions as described herein.

3. Rationally-Designed Meganucleases with Altered DNA-Binding Affinity

As described above, the DNA-binding affinity of the recombinantmeganucleases of the invention can be modulated by altering certainamino acids that form the contact surface with the phosphodiesterbackbone of DNA. The contact surface comprises those amino acids in theenzyme with β-carbons less than 9 Å from the DNA backbone, and with sidechains oriented toward the DNA, irrespective of whether the residuesmake contacts with the DNA backbone in the wild-type meganuclease-DNAcomplex. Because DNA-binding is a necessary precursor to enzymeactivity, increases/decreases in DNA-binding affinity have been shown tocause increases/decreases, respectively, in enzyme activity. However,increases/decreases in DNA-binding affinity also have been shown tocause decreases/increases in the meganuclease sequence-specificity.Therefore, both activity and specificity can be modulated by modifyingthe phosphodiester backbone contacts.

Specifically, to Increase Enzyme Activity/Decrease Enzyme Specificity:

(i) Remove electrostatic repulsion between the enzyme and DNA backbone.If an identified amino acid has a negatively-charged side chain (e.g.,aspartic acid, glutamic acid) which would be expected to repulse thenegatively-charged DNA backbone, the repulsion can be eliminated bysubstituting an amino acid with an uncharged or positively-charged sidechain, subject to effects of steric interference. An experimentallyverified example is the mutation of glutamic acid 80 in I-CreI toglutamine.

(ii) Introduce electrostatic attraction interaction between the enzymeand the DNA backbone. At any of the positions of the contact surface,the introduction of an amino acid with a positively-charged side chain(e.g., lysine or arginine) is expected to increase binding affinity,subject to effects of steric interference.

(iii) Introduce a hydrogen-bond between the enzyme and the DNA backbone.If an amino acid of the contact surface does not make a hydrogen bondwith the DNA backbone because it lacks an appropriate hydrogen-bondingfunctionality or has a side chain that is too short, too long, and/ortoo inflexible to interact with the DNA backbone, a polar amino acidcapable of donating a hydrogen bond (e.g., serine, threonine, tyrosine,histidine, glutamine, asparagine, lysine, cysteine, or arginine) withthe appropriate length and flexibility can be introduced, subject toeffects of steric interference.

Specifically, to Decrease Enzyme Activity/Increase Enzyme Specificity:

(i) Introduce electrostatic repulsion between the enzyme and the DNAbackbone. At any of the positions of the contact surface, theintroduction of an amino acid with a negatively-charged side chain(e.g., glutamic acid, aspartic acid) is expected to decrease bindingaffinity, subject to effects of steric interference.

(ii) Remove electrostatic attraction between the enzyme and DNA. If anyamino acid of the contact surface has a positively-charged side chain(e.g., lysine or arginine) that interacts with the negatively-chargedDNA backbone, this favorable interaction can be eliminated bysubstituting an amino acid with an uncharged or negatively-charged sidechain, subject to effects of steric interference. An experimentallyverified example is the mutation of lysine 116 in I-CreI to asparticacid.

(iii) Remove a hydrogen-bond between the enzyme and the DNA backbone. Ifany amino acid of the contact surface makes a hydrogen bond with the DNAbackbone, it can be substituted to an amino acid that would not beexpected to make a similar hydrogen bond because its side chain is notappropriately functionalized or it lacks the necessarylength/flexibility characteristics.

For example, in some recombinant meganucleases based on I-CreI, theglutamic acid at position 80 in the I-CreI meganuclease is altered toeither a lysine or a glutamine to increase activity. In anotherembodiment, the tyrosine at position 66 of I-CreI is changed to arginineor lysine, which increases the activity of the meganuclease. In yetanother embodiment, enzyme activity is decreased by changing the lysineat position 34 of I-CreI to aspartic acid, changing the tyrosine atposition 66 to aspartic acid, and/or changing the lysine at position 116to aspartic acid.

The activities of the recombinant meganucleases can be modulated suchthat the recombinant enzyme has anywhere from no activity to very highactivity with respect to a particular recognition sequence. For example,the DJ1 recombinant meganuclease when carrying glutamic acid mutation atposition 26 loses activity completely. However, the combination of theglutamic acid substitution at position 26 and a glutamine substitutionat position 80 creates a recombinant meganuclease with high specificityand activity toward a guanine at −4 within the recognition sequencehalf-site (see FIG. 1(D)).

In accordance with the invention, amino acids at various positions inproximity to the phosphodiester DNA backbone can be changed tosimultaneously affect both meganuclease activity and specificity. This“tuning” of the enzyme specificity and activity is accomplished byincreasing or decreasing the number of contacts made by amino acids withthe phosphodiester backbone. A variety of contacts with thephosphodiester backbone can be facilitated by amino acid side chains. Insome embodiments, ionic bonds, salt bridges, hydrogen bonds, and sterichindrance affect the association of amino acid side chains with thephosphodiester backbone. For example, for the I-CreI meganuclease,alteration of the lysine at position 116 to an aspartic acid removes asalt bridge between nucleic acid base pairs at positions −8 and −9,reducing the rate of enzyme cleavage but increasing the specificity.

The residues forming the backbone contact surface of each of thewild-type I-CreI (SEQ ID NO: 1), I-MsoI (SEQ ID NO: 6), I-SceI (SEQ IDNO: 9) and I-CeuI (SEQ ID NO: 12) meganucleases are identified in Table5 below:

TABLE 5 I-CreI I-MsoI I-SceI I-CeuI P29, K34, T46, K48, K36, Q41, R51,N70, N15, N17, L19, K20, K21, D25, K28, K31, R51, V64, Y66, E80, I85,G86, S87, T88, K23, K63, L80, S81, S68, N70, H94, R112, I81, K82, L112,H89, Y118, Q122, H84, L92, N94, N120, R114, S117, N120, K116, D137,K139, K123, Q139, K143, K122, K148, Y151, D128, N129, R130, T140, T143R144, E147, S150, K153, T156, N157, H172 N152 S159, N163, Q165, S166,Y188, K190, I191, K193, N194, K195, Y199, D201, S202, Y222, K223

To increase the affinity of an enzyme and thereby make it moreactive/less specific:

-   (1) Select an amino acid from Table 5 for the corresponding enzyme    that is either negatively-charged (D or E), hydrophobic (A, C, F, G,    I, L, M, P, V, W, Y), or uncharged/polar (H, N, Q, S, T).-   (2) If the amino acid is negatively-charged or hydrophobic, mutate    it to uncharged/polar (less effect) or positively-charged (K or R,    more effect).-   (3) If the amino acid is uncharged/polar, mutate it to    positively-charged.

To decrease the affinity of an enzyme and thereby make it lessactive/more specific:

-   (1) Select an amino acid from Table 5 for the corresponding enzyme    that is either positively-charged (K or R), hydrophobic (A, C, F, G,    I, L, M, P, V, W, Y), or uncharged/polar (H, N, Q, S, T).-   (2) If the amino acid is positively-charged, mutate it to    uncharged/polar (less effect) or negatively-charged (more effect).-   (3) If the amino acid is hydrophobic or uncharged/polar, mutate it    to negatively-charged.

4. Heterodimeric Meganucleases

In another aspect, the invention provides meganucleases which areheterodimers formed by the association of two monomers, one of which maybe a wild-type and one or both of which may be a non-naturally-occurringor recombinant form. For example, wild-type I-CreI meganuclease isnormally a homodimer composed of two monomers that each bind to onehalf-site in the pseudo-palindromic recognition sequence. Aheterodimeric recombinant meganuclease can be produced by combining twomeganucleases that recognize different half-sites, for example byco-expressing the two meganucleases in a cell or by mixing twomeganucleases in solution. The formation of heterodimers can be favoredover the formation of homodimers by altering amino acids on each of thetwo monomers that affect their association into dimers. In particularembodiments, certain amino acids at the interface of the two monomersare altered from negatively-charged amino acids (D or E) topositively-charged amino acids (K or R) on a first monomer and frompositively-charged amino acids to negatively-charged amino acids on asecond monomer (Table 6). For example, in the case of meganucleasesderived from I-CreI, lysines at positions 7 and 57 are mutated toglutamic acids in the first monomer and glutamic acids at positions 8and 61 are mutated to lysines in the second monomer. The result of thisprocess is a pair of monomers in which the first monomer has an excessof positively-charged residues at the dimer interface and the secondmonomer has an excess of negatively-charged residues at the dimerinterface. The first and second monomer will, therefore, associatepreferentially over their identical monomer pairs due to theelectrostatic interactions between the altered amino acids at theinterface.

TABLE 6 I-CreI: First Monomer I-CreI: Second Monomer SubstitutionsSubstitutions K7 to E7 or D7 E8 to K8 or R8 K57 to E57 or D57 E61 to K61or R61 K96 to E96 or D96 I-MsoI: First Monomer I-MsoI: Second MonomerSubstitutions Substitutions R302 to E302 or D302 D20 to K60 or R60 E11to K11 or R11 Q64 to K64 or R64 I-CeuI: First Monomer I-CeuI: SecondMonomer Substitutions Substitutions R93 to E93 or D93 E152 to K152 orR152

Alternatively, or in addition, certain amino acids at the interface ofthe two monomers can be altered to sterically hinder homodimerformation. Specifically, amino acids in the dimer interface of onemonomer are substituted with larger or bulkier residues that willsterically prevent the homodimer. Amino acids in the dimer interface ofthe second monomer optionally can be substituted with smaller residuesto compensate for the bulkier residues in the first monomer and removeany clashes in the heterodimer, or can be unmodified.

In another alternative or additional embodiment, an ionic bridge orhydrogen bond can be buried in the hydrophobic core of a heterodimericinterface. Specifically, a hydrophobic residue on one monomer at thecore of the interface can be substituted with a positively chargedresidue. In addition, a hydrophobic residue on the second monomer, thatinteracts in the wild type homodimer with the hydrophobic residuesubstituted in the first monomer, can be substituted with a negativelycharged residue. Thus, the two substituted residues can form an ionicbridge or hydrogen bond. At the same time, the electrostatic repulsionof an unsatisfied charge buried in a hydrophobic interface shoulddisfavor homodimer formation.

Finally, as noted above, each monomer of the heterodimer can havedifferent amino acids substituted in the DNA recognition region suchthat each has a different DNA half-site and the combined dimeric DNArecognition sequence is non-palindromic.

5. Methods of Producing Recombinant Cells and Organisms

Aspects of the present invention further provide methods for producingrecombinant, transgenic or otherwise genetically-modified cells andorganisms using rationally-designed meganucleases. Thus, in certainembodiments, recombinant meganucleases are developed to specificallycause a double-stranded break at a single site or at relatively fewsites in the genomic DNA of a cell or an organism to allow for preciseinsertion(s) of a sequence of interest by homologous recombination. Inother embodiments, recombinant meganucleases are developed tospecifically cause a double-stranded break at a single site or atrelatively few sites in the genomic DNA of a cell or an organism toeither (a) allow for rare insertion(s) of a sequence of interest bynon-homologous end-joining or (b) allow for the disruption of the targetsequence by non-homologous end-joining. As used herein with respect tohomologous recombination or non-homologous end-joining of sequences ofinterest, the term “insertion” means the ligation of a sequence ofinterest into a chromosome such that the sequence of interest isintegrated into the chromosome. In the case of homologous recombination,an inserted sequence can replace an endogenous sequence, such that theoriginal DNA is replaced by exogenous DNA of equal length, but with analtered nucleotide sequence. Alternatively, an inserted sequence caninclude more or fewer bases than the sequence it replaces.

Therefore, in accordance with this aspect of the invention, therecombinant organisms include, but are not limited to, monocot plantspecies such as rice, wheat, corn (maize) and rye, and dicot speciessuch as legumes (e.g., kidney beans, soybeans, lentils, peanuts, peas),alfalfa, clover, tobacco and Arabidopsis species. In addition, therecombinant organisms can include, but are not limited to, animals suchas humans and non-human primates, horses, cows, goats, pigs, sheep,dogs, cats, guinea pigs, rats, mice, lizards, fish and insects such asDrosophila species. In other embodiments, the organism is a fungus suchas a Candida, Neurospora or Saccharomyces species.

In some embodiments, the methods of the invention involve theintroduction of a sequence of interest into a cell such as a germ cellor stem cell that can become a mature recombinant organism or allow theresultant genetically-modified organism to give rise to progeny carryingthe inserted sequence of interest in its genome.

Meganuclease proteins can be delivered into cells to cleave genomic DNA,which allows for homologous recombination or non-homologous end-joiningat the cleavage site with a sequence of interest, by a variety ofdifferent mechanisms known in the art. For example, the recombinantmeganuclease protein can introduced into a cell by techniques including,but not limited to, microinjection or liposome transfections (see, e.g.,Lipofectamine™, Invitrogen Corp., Carlsbad, Calif.). The liposomeformulation can be used to facilitate lipid bilayer fusion with a targetcell, thereby allowing the contents of the liposome or proteinsassociated with its surface to be brought into the cell. Alternatively,the enzyme can be fused to an appropriate uptake peptide such as thatfrom the HIV TAT protein to direct cellular uptake (see, e.g., Hudecz etal. (2005), Med. Res. Rev. 25: 679-736).

Alternatively, gene sequences encoding the meganuclease protein areinserted into a vector and transfected into a eukaryotic cell usingtechniques known in the art (see, e.g., Ausubel et. al., CurrentProtocols in Molecular Biology, Wiley 1999). The sequence of interestcan be introduced in the same vector, a different vector, or by othermeans known in the art.

Non-limiting examples of vectors for DNA transfection include virusvectors, plasmids, cosmids, and YAC vectors. Transfection of DNAsequences can be accomplished by a variety of methods known to those ofskill in the art. For instance, liposomes and immunoliposomes are usedto deliver DNA sequences to cells (see, e.g., Lasic et al. (1995),Science 267: 1275-76). In addition, viruses can be utilized to introducevectors into cells (see, e.g., U.S. Pat. No. 7,037,492). Alternatively,transfection strategies can be utilized such that the vectors areintroduced as naked DNA (see, e.g., Rui et al. (2002), Life Sci. 71(15):1771-8).

General methods for delivering nucleic acids into cells include: (1)chemical methods (Graham et al. (1973), Virology 54(2):536-539;Zatloukal et al. (1992), Ann. N.Y. Acad. Sci., 660:136-153; (2) physicalmethods such as microinjection (Capecchi (1980), Cell 22(2):479-488,electroporation (Wong et al. (1982), Biochim. Biophys. Res. Commun.107(2):584-587; Fromm et al. (1985), Proc. Nat'l Acad. Sci. USA82(17):5824-5828; U.S. Pat. No. 5,384,253) and ballistic injection(Johnston et al. (1994), Methods Cell. Biol. 43(A): 353-365; Fynan etal. (1993), Proc. Nat'l Acad. Sci. USA 90(24): 11478-11482); (3) viralvectors (Clapp (1993), Clin. Perinatol. 20(1): 155-168; Lu et al.(1993), J. Exp. Med. 178(6):2089-2096; Eglitis et al. (1988), Avd. Exp.Med. Biol. 241:19-27; Eglitis et al. (1988), Biotechniques6(7):608-614); and (4) receptor-mediated mechanisms (Curiel et al.(1991), Proc. Nat'l Acad. Sci. USA 88(19):8850-8854; Curiel et al.(1992), Hum. Gen. Ther. 3(2):147-154; Wagner et al. (1992), Proc. Nat'lAcad. Sci. USA 89 (13):6099-6103).

In certain embodiments, a genetically-modified plant is produced, whichcontains the sequence of interest inserted into the genome. In certainembodiments, the genetically-modified plant is produced by transfectingthe plant cell with DNA sequences corresponding to the recombinantmeganuclease and the sequence of interest, which may or may not beflanked by the meganuclease recognition sequences and/or sequencessubstantially identical to the target sequence. In other embodiments,the genetically-modified plant is produced by transfecting the plantcell with DNA sequences corresponding to the recombinant meganucleaseonly, such that cleavage promotes non-homologous end-joining anddisrupts the target sequence containing the recognition sequence. Insuch embodiments, the meganuclease sequences are under the control ofregulatory sequences that allow for expression of the meganuclease inthe host plant cells. These regulatory sequences include, but are notlimited to, constitutive plant promoters such as the NOS promoter,chemically-inducible gene promoters such as the dexamethasone-induciblepromoter (see, e.g., Gremillon et al. (2004), Plant J. 37:218-228), andplant tissue specific promoters such as the LGC1 promoter (see, e.g.,Singh et al. (2003), FEBS Lett. 542:47-52).

Suitable methods for introducing DNA into plant cells include virtuallyany method by which DNA can be introduced into a cell, including but notlimited to Agrobacterium infection, PEG-mediated transformation ofprotoplasts (Omirulleh et al. (1993), Plant Molecular Biology,21:415-428), desiccation/inhibition-mediated DNA uptake,electroporation, agitation with silicon carbide fibers, ballisticinjection or microprojectile bombardment, and the like.

In other embodiments, a genetically-modified animal is produced using arecombinant meganuclease. As with plant cells, the nucleic acidsequences can be introduced into a germ cell or a cell that willeventually become a transgenic organism. In some embodiments, the cellis a fertilized egg, and exogenous DNA molecules can be injected intothe pro-nucleus of the fertilized egg. The micro-injected eggs are thentransferred into the oviducts of pseudopregnant foster mothers andallowed to develop. The recombinant meganuclease is expressed in thefertilized egg (e.g., under the control of a constitutive promoter, suchas 3-phosphoglycerate kinase), and facilitates homologous recombinationof the sequence of interest into one or a few discrete sites in thegenome. Alternatively, the genetically-modified animals can be obtainedby utilizing recombinant embryonic stem (“ES”) cells for the generationof the transgenics, as described by Gossler et al. (1986), Proc. Natl.Acad. Sci. USA 83:9065 9069.

In certain embodiments, a recombinant mammalian expression vector iscapable of directing tissue-specific expression of the nucleic acidpreferentially in a particular cell type. Tissue-specific regulatoryelements are known in the art. Non-limiting examples of suitabletissue-specific promoters include the albumin promoter (liver-specific;Pinkert et al. (1987), Genes Dev. 1: 268-277), lymphoid-specificpromoters (Calame and Eaton (1988), Adv. Immunol. 43: 235-275), inparticular promoters of T cell receptors (Winoto and Baltimore (1989),EMBO J. 8: 729-733) and immunoglobulins (Banerji et al. (1983), Cell 33:729-740; Queen and Baltimore (1983), Cell 33: 741-748), neuron-specificpromoters (e.g., the neurofilament promoter; Byrne and Ruddle (1989),Proc. Natl. Acad. Sci. USA 86: 5473-5477), pancreas-specific promoters(Edlund et al. (1985), Science 230: 912-916), and mammary gland-specificpromoters (e.g., milk whey promoter; U.S. Pat. No. 4,873,316 andEuropean Pat. Pub. EP 0 264 166). Developmentally-regulated promotersare also encompassed, e.g., the murine hox promoters (Kessel and Gruss(1990), Science 249: 374-379) and the a-fetoprotein promoter (Campes andTilghman (1989), Genes Dev. 3: 537-546).

In certain embodiments, a rationally-designed meganuclease may be taggedwith a peptide epitope (e.g., an HA, FLAG, or Myc epitope) to monitorexpression levels or localization. In some embodiments, the meganucleasemay be fused to a sub-cellular localization signal such as anuclear-localization signal (e.g., the nuclear localization signal fromSV40) or chloroplast or mitochondrial localization signals. In otherembodiments, the meganuclease may be fused to a nuclear export signal tolocalize it to the cytoplasm. The meganuclease may also be fused to anunrelated protein or protein domain such as a protein that stimulatesDNA-repair or homologous recombination (e.g., recA, RAD51, RAD52, RAD54,RAD57 or BRCA2).

6. Methods for Gene Therapy

Aspects of the invention allow for the use of recombinant meganucleasefor gene therapy. As used herein, “gene therapy” means therapeutictreatments that comprise introducing into a patient a functional copy ofat least one gene, or gene regulatory sequence such as a promoter,enhancer, or silencer to replace a gene or gene regulatory region thatis defective in its structure and/or function. The term “gene therapy”can also refer to modifications made to a deleterious gene or regulatoryelement (e.g., oncogenes) that reduce or eliminate expression of thegene. Gene therapy can be performed to treat congenital conditions,conditions resulting from mutations or damage to specific genetic lociover the life of the patient, or conditions resulting from infectiousorganisms.

In some aspects of the invention, dysfunctional genes are replaced ordisabled by the insertion of exogenous nucleic acid sequences into aregion of the genome affecting gene expression. In certain embodiments,the recombinant meganuclease is targeted to a particular sequence in theregion of the genome to be modified so as to alleviate the condition.The sequence can be a region within an exon, intron, promoter, or otherregulatory region that is causing dysfunctional expression of the gene.As used herein, the term “dysfunctional expression” means aberrantexpression of a gene product either by the cell producing too little ofthe gene product, too much of the gene product, or producing a geneproduct that has a different function such as lacking the necessaryfunction or having more than the necessary function.

Exogenous nucleic acid sequences inserted into the modified region canbe used to provide “repaired” sequences that normalize the gene. Generepair can be accomplished by the introduction of proper gene sequencesinto the gene allowing for proper function to be reestablished. In theseembodiments, the nucleic acid sequence to be inserted can be the entirecoding sequence for a protein or, in certain embodiments, a fragment ofthe gene comprising only the region to be repaired. In other embodimentsthe nucleic acid sequence to be inserted comprises a promoter sequenceor other regulatory elements such that mutations causing abnormalexpression or regulation are repaired. In other embodiments, the nucleicacid sequence to be inserted contains the appropriate translation stopcodon lacking in a mutated gene. The nucleic acid sequence can also havesequences for stopping transcription in a recombinant gene lackingappropriate transcriptional stop signals.

Alternatively, the nucleic acid sequences can eliminate gene functionaltogether by disrupting the regulatory sequence of the gene orproviding a silencer to eliminate gene function. In some embodiments,the exogenous nucleic acid sequence provides a translation stop codon toprevent expression of the gene product. In other embodiments, theexogenous nucleic acid sequences provide transcription stop element toprevent expression of a full length RNA molecule. In still otherembodiments, gene function is disrupted directly by the meganuclease byintroducing base insertions, base deletions, and/or frameshift mutationsthrough non-homologous end-joining.

In many instances, it is desirable to direct the proper geneticsequences to a target cell or population of cells that is the cause ofthe disease condition. Such targeting of therapeutics prevents healthycells from being targeted by the therapeutics. This increases theefficacy of the treatment, while decreasing the potentially adverseeffects that the treatment could have on healthy cells.

Delivery of recombinant meganuclease genes and the sequence of interestto be inserted into the genome to the cells of interest can beaccomplished by a variety of mechanisms. In some embodiments, thenucleic acids are delivered to the cells by way of viruses withparticular viral genes inactivated to prevent reproduction of the virus.Thus, a virus can be altered so that it is capable only of delivery andmaintenance within a target cell, but does not retain the ability toreplicate within the target cell or tissue. One or more DNA sequencescan be introduced to the altered viral genome, so as to produce a viralgenome that acts like a vector, and may or may not be inserted into ahost genome and subsequently expressed. More specifically, certainembodiments include employing a retroviral vector such as, but notlimited to, the MFG or pLJ vectors. An MFG vector is a simplifiedMoloney murine leukemia virus vector (MoMLV) in which the DNA sequencesencoding the pol and env proteins have been deleted to render itreplication defective. A pLJ retroviral vector is also a form of theMoMLV (see, e.g., Korman et al. (1987), Proc. Nat'l Acad. Sci.,84:2150-2154). In other embodiments, a recombinant adenovirus oradeno-associated virus can be used as a delivery vector.

In other embodiments, the delivery of recombinant meganuclease proteinand/or recombinant meganuclease gene sequences to a target cell isaccomplished by the use of liposomes. The production of liposomescontaining nucleic acid and/or protein cargo is known in the art (see,e.g., Lasic et al. (1995), Science 267: 1275-76). Immunoliposomesincorporate antibodies against cell-associated antigens into liposomes,and can delivery DNA sequences for the meganuclease or the meganucleaseitself to specific cell types (see, e.g., Lasic et al. (1995), Science267: 1275-76; Young et al. (2005), J. Calif. Dent. Assoc. 33(12):967-71; Pfeiffer et al. (2006), J. Vasc. Surg. 43(5):1021-7). Methodsfor producing and using liposome formulations are well known in the art,(see, e.g., U.S. Pat. No. 6,316,024, U.S. Pat. No. 6,379,699, U.S. Pat.No. 6,387,397, U.S. Pat. No. 6,511,676 and U.S. Pat. No. 6,593,308, andreferences cited therein). In some embodiments, liposomes are used todeliver the sequences of interest as well as the recombinantmeganuclease protein or recombinant meganuclease gene sequences.

7. Methods for Treating Pathogen Infection.

Aspects of the invention also provide methods of treating infection by apathogen. Pathogenic organisms include viruses such as, but not limitedto, herpes simplex virus 1, herpes simplex virus 2, humanimmunodeficiency virus 1, human immunodeficiency virus 2, variola virus,polio virus, Epstein-Barr virus, and human papilloma virus and bacterialorganisms such as, but not limited to, Bacillus anthracis, Haemophilusspecies, Pneumococcus species, Staphylococcus aureus, Streptococcusspecies, methicillin-resistant Staphylococcus aureus, and Mycoplasmatuberculosis. Pathogenic organisms also include fungal organisms suchas, but not limited to, Candida, Blastomyces, Cryptococcus, andHistoplasma species.

In some embodiments, a rationally-designed meganuclease can be targetedto a recognition sequence within the pathogen genome, e.g., to a gene orregulatory element that is essential for growth, reproduction, ortoxicity of the pathogen. In certain embodiments, the recognitionsequence may be in a bacterial plasmid. Meganuclease-mediated cleavageof a recognition sequence in a pathogen genome can stimulate mutationwithin a targeted, essential gene in the form of an insertion, deletionor frameshift, by stimulating non-homologous end-joining. Alternatively,cleavage of a bacterial plasmid can result in loss of the plasmid alongwith any genes encoded on it, such as toxin genes (e.g., B. anthracisLethal Factor gene) or antibiotic resistance genes. As noted above, themeganuclease may be delivered to the infected patient, animal, or plantin either protein or nucleic acid form using techniques that are commonin the art. In certain embodiments, the meganuclease gene may beincorporated into a bacteriophage genome for delivery to pathogenicbacteria.

Aspects of the invention also provide therapeutics for the treatment ofcertain forms of cancer. Because human viruses are often associated withtumor formation (e.g., Epstein-Barr Virus and nasopharyngeal carcinomas;Human Papilloma Virus and cervical cancer) inactivation of these viralpathogens may prevent cancer development or progression. Alternatively,double-stranded breaks targeted to the genomes of these tumor-associatedviruses using rationally-designed meganucleases may be used to triggerapoptosis through the DNA damage response pathway. In this manner, itmay be possible to selectively induce apoptosis in tumor cells harboringthe viral genome.

8. Methods for Genotyping and Pathogen Identification

Aspects of the invention also provide tools for in vitro molecularbiology research and development. It is common in the art to usesite-specific endonucleases (e.g., restriction enzymes) for theisolation, cloning, and manipulation of nucleic acids such as plasmids,PCR products, BAC sequences, YAC sequences, viruses, and genomicsequences from eukaryotic and prokaryotic organisms (see, e.g., Ausubelet al., Current Protocols in Molecular Biology, Wiley 1999). Thus, insome embodiments, a rationally-designed meganuclease may be used tomanipulate nucleic acid sequences in vitro. For example,rationally-designed meganucleases recognizing a pair of recognitionsequences within the same DNA molecule can be used to isolate theintervening DNA segment for subsequent manipulation such as ligationinto a bacterial plasmid, BAC, or YAC.

In another aspect, this invention provides tools for the identificationof pathogenic genes and organisms. In one embodiment,rationally-designed meganucleases can be used to cleave recognitionsites corresponding to polymorphic genetic regions correlated to diseaseto distinguish disease-causing alleles from healthy alleles (e.g., arationally-designed meganuclease which recognizes the ΔF-508 allele ofthe human CFTR gene, see example 4). In this embodiment, DNA sequencesisolated from a human patient or other organism are digested with arationally-designed meganuclease, possibly in conjunction withadditional site-specific nucleases, and the resulting DNA fragmentpattern is analyzed by gel electrophoresis, capillary electrophoresis,mass spectrometry, or other methods known in the art. This fragmentationpattern and, specifically, the presence or absence of cleavage by therationally-designed meganuclease, indicates the genotype of the organismby revealing whether or not the recognition sequence is present in thegenome. In another embodiment, a rationally-designed meganuclease istargeted to a polymorphic region in the genome of a pathogenic virus,fungus, or bacterium and used to identify the organism. In thisembodiment, the rationally-designed meganuclease cleaves a recognitionsequence that is unique to the pathogen (e.g., the spacer region betweenthe 16S and 23S rRNA genes in a bacterium; see, e.g., van der Giessen etal. (1994), Microbiology 140:1103-1108) and can be used to distinguishthe pathogen from other closely-related organisms following endonucleasedigest of the genome and subsequent analysis of the fragmentationpattern by electrophoresis, mass spectrometry, or other methods known inthe art.

9. Methods for the Production of Custom DNA-Binding Domains.

In another aspect, the invention provides rationally-designedDNA-binding proteins that lack endonuclease cleavage activity. Thecatalytic activity of a rationally-designed meganuclease can beeliminated by mutating amino acids involved in catalysis (e.g., themutation of Q47 to E in I-CreI, see Chevalier et al. (2001),Biochemistry. 43:14015-14026); the mutation of D44 or D145 to N inI-SceI; the mutation of E66 to Q in I-CeuI; the mutation of D22 to N inI-MsoI). The inactivated meganuclease can then be fused to an effectordomain from another protein including, but not limited to, atranscription activator (e.g., the GAL4 transactivation domain or theVP16 transactivation domain), a transcription repressor (e.g., the KRABdomain from the Kruppel protein), a DNA methylase domain (e.g., M.CviPIor M. SssI), or a histone acetyltransferase domain (e.g., HDAC1 orHDAC2). Chimeric proteins consisting of an engineered DNA-bindingdomain, most notably an engineered zinc finger domain, and an effectordomain are known in the art (see, e.g., Papworth et al. (2006), Gene366:27-38).

EXAMPLES

This invention is further illustrated by the following examples, whichshould not be construed as limiting. Those skilled in the art willrecognize, or be able to ascertain, using no more than routineexperimentation, numerous equivalents to the specific substances andprocedures described herein. Such equivalents are intended to beencompassed in the scope of the claims that follow the examples below.Examples 1-4 below refer specifically to rationally-designedmeganucleases based on I-CreI, but rationally-designed meganucleasesbased on I-SceI, I-MsoI, I-CeuI, and other LAGLIDADG meganucleases canbe similarly produced and used, as described herein.

Example 1 Rational Design of Meganucleases Recognizing the HIV-1 TATGene 1. Meganuclease Design.

A pair of meganucleases were designed to recognize and cleave the DNAsite 5′-GAAGAGCTCATCAGAACAGTCA-3′ (SEQ ID NO: 15) found in the HIV-1 TATGene. In accordance with Table 1, two meganucleases, TAT1 and TAT2, weredesigned to bind the half-sites 5′-GAAGAGCTC-3′ (SEQ ID NO: 16) and5′-TGACTGTTC-3′ (SEQ ID NO: 17), respectively, using the following basecontacts (non-WT contacts are in bold):

TAT1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base G A A G A G C T C Contact S32Y33 N30/ R40 K28 S26/ K24/ Q44 R70 Resi- Q38 R77 Y68 dues

TAT2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T G A C T G T T C Contact C32R33 N30/ R28/ M66 S26/ Y68 Q44 R70 Resi- Q38 E40 R77 dues

The two enzymes were cloned, expressed in E. coli, and assayed forenzyme activity against the corresponding DNA recognition sequence asdescribed below. In both cases, the rationally-designed meganucleaseswere found to be inactive. A second generation of each was then producedin which E80 was mutated to Q to improve contacts with the DNA backbone.The second generation TAT2 enzyme was found to be active against itsintended recognition sequence while the second generation TAT1 enzymeremained inactive. Visual inspection of the wild-type I-CreI co-crystalstructure suggested that TAT1 was inactive due to a steric clash betweenR40 and K28. To alleviate this clash, TAT1 variants were produced inwhich K28 was mutated to an amino acid with a smaller side chain (A, S,T, or C) while maintaining the Q80 mutation. When these enzymes wereproduced in E. coli and assayed, the TAT1 variants with S28 and T28 wereboth found to be active against the intended recognition sequence whilemaintaining the desired base preference at position -7.

2. Construction of Recombinant Meganucleases.

Mutations for the redesigned I-CreI enzymes were introduced usingmutagenic primers in an overlapping PCR strategy. Recombinant DNAfragments of I-CreI generated in a primary PCR were joined in asecondary PCR to produce full-length recombinant nucleic acids. Allrecombinant I-CreI constructs were cloned into pET21a vectors with a sixhistidine tag fused at the 3′ end of the gene for purification (NovagenCorp., San Diego, Calif.). All nucleic acid sequences were confirmedusing Sanger Dideoxynucleotide sequencing (see Sanger et al. (1977),Proc. Natl. Acad. Sci. USA. 74(12): 5463-7).

Wild-type I-CreI and all engineered meganucleases were expressed andpurified using the following method. The constructs cloned into a pET21avector were transformed into chemically competent BL21 (DE3) pLysS, andplated on standard 2xYT plates containing 200 μg/ml carbanicillin.Following overnight growth, transformed bacterial colonies were scrapedfrom the plates and used to inoculate 50 ml of 2XYT broth. Cells weregrown at 37° C. with shaking until they reached an optical density of0.9 at a wavelength of 600 nm. The growth temperature was then reducedfrom 37° C. to 22° C. Protein expression was induced by the addition of1 mM IPTG, and the cells were incubated with agitation for two and ahalf hours. Cells were then pelleted by centrifugation for 10 min. at6000× g. Pellets were resuspended in lml binding buffer (20 mM Tris-HCL,pH 8.0, 500 mM NaCl, 10 mM imidazole) by vortexing. The cells were thendisrupted with 12 pulses of sonication at 50% power and the cell debriswas pelleted by centrifugation for 15 min. at 14,000× g. Cellsupernatants were diluted in 4 ml binding buffer and loaded onto a 200μl nickel-charged metal-chelating Sepharose column (Pharmacia).

The column was subsequently washed with 4 ml wash buffer (20 mMTris-HCl, pH 8.0, 500 mM NaCl, 60 mM imidazole) and with 0.2 ml elutionbuffer (20 mM Tris-HCl, pH 8.0, 500 mM NaCl, 400 mM imidazole).Meganuclease enzymes were eluted with an additional 0.6 ml of elutionbuffer and concentrated to 50-130 μl using Vivospin disposableconcentrators (ISC, Inc., Kaysville, Utah). The enzymes were exchangedinto SA buffer (25 mM Tris-HCL, pH 8.0, 100 mM NaCl, 5 mM MgCl2, 5 mMEDTA) for assays and storage using Zeba spin desalting columns (PierceBiotechnology, Inc., Rockford, Ill.). The enzyme concentration wasdetermined by absorbance at 280 nm using an extinction coefficient of23,590 M⁻¹ cm⁻¹. Purity and molecular weight of the enzymes was thenconfirmed by MALDI-TOF mass spectrometry.

Heterodimeric enzymes were produced either by purifying the two proteinsindependently, and mixing them in vitro or by constructing an artificialoperon for tandem expression of the two proteins in E. coli. In theformer case, the purified meganucleases were mixed 1:1 in solution andpre-incubated at 42° C. for 20 minutes prior to the addition of DNAsubstrate. In the latter case, the two genes were cloned sequentiallyinto the pET-21a expression vector using NdeI/EcoRI and EcoRI/HindIII.The first gene in the operon ends with two stop codons to preventread-through errors during transcription. A 12-base pair nucleic acidspacer and a Shine-Dalgarno sequence from the pET21 vector separated thefirst and second genes in the artificial operon.

3. Cleavage Assays.

All enzymes purified as described above were assayed for activity byincubation with linear, double-stranded DNA substrates containing themeganuclease recognition sequence. Synthetic oligonucleotidescorresponding to both sense and antisense strands of the recognitionsequence were annealed and were cloned into the SmaI site of the pUC19plasmid by blunt-end ligation. The sequences of the cloned binding siteswere confirmed by Sanger dideoxynucleotide sequencing. All plasmidsubstrates were linearized with XmnI, ScaI or BpmI concurrently with themeganuclease digest. The enzyme digests contained 5 μl 0.05 μM DNAsubstrate, 2.5 μl 5 μM recombinant I-CreI meganuclease, 9.5 μl SAbuffer, and 0.5 μl XmnI, ScaI, or BpmI. Digests were incubated at either37° C., or 42° C. for certain meganuclease enzymes, for four hours.Digests were stopped by adding 0.3 mg/ml Proteinase K and 0.5% SDS, andincubated for one hour at 37° C. Digests were analyzed on 1.5% agaroseand visualized by ethidium bromide staining.

To evaluate meganuclease half-site preference, rationally-designedmeganucleases were incubated with a set of DNA substrates correspondingto a perfect palindrome of the intended half-site as well as each of the27 possible single-base-pair substitutions in the half-site. In thismanner, it was possible to determine how tolerant each enzyme is todeviations from its intended half-site.

4. Recognition Sequence-Specificity.

Purified recombinant TAT1 and TAT2 meganucleases recognized DNAsequences that were distinct from the wild-type meganuclease recognitionsequence (FIG. 2(B)). The wild-type I-CreI meganuclease cleaves the WTrecognition sequence, but cuts neither the intended sequence for TAT1nor the intended sequence for TAT2. TAT1 and TAT2, likewise, cut theirintended recognition sequences but not the wild-type sequence. Themeganucleases were then evaluated for half-site preference and overallspecificity (FIG. 3). Wild-type I-CreI was found to be highly tolerantof single-base-pair substitutions in its natural half-site. In contrast,TAT1 and TAT2 were found to be highly-specific and completely intolerantof base substitutions at positions −1, −2, −3, −6, and −8 in the case ofTAT1, and positions −1, −2, and −6 in the case of TAT2.

Example 2 Rational Design of Meganucleases with Altered DNA-BindingAffinity

1. Meganucleases with Increased Affinity and Increased Activity.

The meganucleases CCR1 and BRP2 were designed to cleave the half-sites5′-AACCCTCTC-3′ (SEQ ID NO: 18) and 5′-CTCCGGGTC-3′ (SEQ ID NO: 19),respectively. These enzymes were produced in accordance with Table 1 asin Example 1:

CCR1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base A A C C C T C T C Contact N32Y33 R30/ R28/ E42 Q26 K24/ Q44 R70 Resi- E38 E40 Y68 dues

BRP2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C T C C G G G T C Contact S32C33 R30/ R28/ R42 S26/ R68 Q44 R70 Resi- E38 E40 R77 dues

Both enzymes were expressed in E. coli, purified, and assayed as inExample 1. Both first generation enzymes were found to cleave theirintended recognition sequences with rates that were considerably belowthat of wild-type I-CreI with its natural recognition sequence. Toalleviate this loss in activity, the DNA-binding affinity of CCR1 andBRP2 was increased by mutating E80 to Q in both enzymes. Thesesecond-generation versions of CCR1 and BRP2 were found to cleave theirintended recognition sequences with substantially increased catalyticrates.

2. Meganucleases with Decreased DNA-Binding Affinity and DecreasedActivity but Increased Specificity.

Wild-type I-CreI was found to be highly-tolerant of substitutions to itshalf-site (FIG. 3(A)). In an effort to make the enzyme more specific,the lysine at position 116 of the enzyme, which normally makes asalt-bridge with a phosphate in the DNA backbone, was mutated toaspartic acid to reduce DNA-binding affinity. This rationally-designedenzyme was found to cleave the wild-type recognition sequence withsubstantially reduced activity but the recombinant enzyme wasconsiderably more specific than wild-type. The half-site preference ofthe K116D variant was evaluated as in Example 1 and the enzyme was foundto be entirely intolerant of deviation from its natural half-site atpositions −1, −2, and −3, and displayed at least partial base preferenceat the remaining 6 positions in the half-site (FIG. 3(B)).

Example 3 Rationally-Designed Meganuclease Heterodimers 1. Cleavage ofNon-Palindromic DNA Sites by Meganuclease Heterodimers Formed inSolution.

Two meganucleases, LAM1 and LAM2, were designed to cleave the half-sites5′-TGCGGTGTC-3′ (SEQ ID NO: 20) and 5′-CAGGCTGTC-3′ (SEQ ID NO: 21),respectively. The heterodimer of these two enzymes was expected torecognize the DNA sequence 5′-TGCGGTGTCCGGCGACAGCCTG-3′ (SEQ ID NO: 22)found in the bacteriophage λ p05 gene.

LAM1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T G C G G T G T C Contact C32R33 R30/ D28/ R42 Q26 R68 Q44 R70 Resi- E38 R40 dues

LAM2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C A G G C T G T C Contact S32Y33 E30/ R40 K28/ Q26 R68 Q44 R70 Resi- R38 E42 dues

LAM1 and LAM 2 were cloned, expressed in E. coli, and purifiedindividually as described in Example 1. The two enzymes were then mixed1:1 and incubated at 42° C. for 20 minutes to allow them to exchangesubunits and re-equilibrate. The resulting enzyme solution, expected tobe a mixture of LAM1 homodimer, LAM2 homodimer, and LAM1/LAM2heterodimer, was incubated with three different recognition sequencescorresponding to the perfect palindrome of the LAM1 half-site, theperfect palindrome of the LAM2 half-site, and the non-palindromic hybridsite found in the bacteriophage λ genome. The purified LAM1 enzyme alonecuts the LAM1 palindromic site, but neither the LAM2 palindromic site,nor the LAM1/LAM2 hybrid site. Likewise, the purified LAM2 enzyme alonecuts the LAM2 palindromic site but neither the LAM1 palindromic site northe LAM1/LAM2 hybrid site. The 1:1 mixture of LAM1 and LAM2, however,cleaves all three DNA sites. Cleavage of the LAM1/LAM2 hybrid siteindicates that two distinct redesigned meganucleases can be mixed insolution to form a heterodimeric enzyme capable of cleaving anon-palindromic DNA site.

2. Cleavage of Non-Palindromic DNA Sites by Meganuclease HeterodimersFormed by Co-Expression.

Genes encoding the LAM1 and LAM2 enzymes described above were arrangedinto an operon for simultaneous expression in E. coli as described inExample 1. The co-expressed enzymes were purified as in Example 1 andthe enzyme mixture incubated with the three potential recognitionsequences described above. The co-expressed enzyme mixture was found tocleave all three sites, including the LAM1/LAM2 hybrid site, indicatingthat two distinct rationally-designed meganucleases can be co-expressedto form a heterodimeric enzyme capable of cleaving a non-palindromic DNAsite.

3. Preferential Cleavage of Non-Palindromic DNA Sites by MeganucleaseHeterodimers with Modified Protein-Protein Interfaces.

For applications requiring the cleavage of non-palindromic DNA sites, itis desirable to promote the formation of enzyme heterodimers whileminimizing the formation of homodimers that recognize and cleavedifferent (palindromic) DNA sites. To this end, variants of the LAM1enzyme were produced in which lysines at positions 7, 57, and 96 werechanged to glutamic acids. This enzyme was then co-expressed andpurified as in above with a variant of LAM2 in which glutamic acids atpositions 8 and 61 were changed to lysine. In this case, formation ofthe LAM1 homodimer was expected to be reduced due to electrostaticrepulsion between E7, E57, and E96 in one monomer and E8 and E61 in theother monomer. Likewise, formation of the LAM2 homodimer was expected tobe reduced due to electrostatic repulsion between K7, K57, and K96 onone monomer and K8 and K61 on the other monomer. Conversely, theLAM1/LAM2 heterodimer was expected to be favored due to electrostaticattraction between E7, E57, and E96 in LAM1 and K8 and K61 in LAM2. Whenthe two meganucleases with modified interfaces were co-expressed andassayed as described above, the LAM1/LAM2 hybrid site was found to becleaved preferentially over the two palindromic sites, indicating thatsubstitutions in the meganuclease protein-protein interface can drivethe preferential formation of heterodimers.

Example 4 Additional Meganuclease Heterodimers which Cleave PhysiologicDNA Sequences

1. Meganuclease Heterodimers which Cleave DNA Sequences Relevant to GeneTherapy.

A rationally-designed meganuclease heterodimer (ACH1/ACH2) can beproduced that cleaves the sequence 5′-CTGGGAGTCTCAGGACAGCCTG-3′ (SEQ IDNO: 23) in the human FGFR3 gene, mutations in which causeachondroplasia. For example, a meganuclease was designed based on theI-CreI meganuclease, as described above, with the following contactresidues and recognition sequence half-sites:

ACH1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C T G G G A G T C Contact D32C33 E30/ R40/ R42 A26/ R68 Q44 R70 Resi- R38 D28 Q77 dues

ACH2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C A G G C T G T C Contact D32Y33 E30/ R40 K28/ Q26 R68 Q44 R70 Resi- R38 E42 dues

A rationally-designed meganuclease heterodimer (HGH1/HGH2) can beproduced that cleaves the sequence 5′-CCAGGTGTCTCTGGACTCCTCC-3′ (SEQ IDNO: 24) in the promoter of the Human Growth Hormone gene. For example, ameganuclease was designed based on the I-CreI meganuclease, as describedabove, with the following contact residues and recognition sequencehalf-sites:

HGH1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C C A G G T G T C Contact D32C33 N30/ R40/ R42 Q26 R68 Q44 R70 Resi- Q38 D28 dues

HGH2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base G G A G G A G T C Contact K32R33 N30/ R40/ R42 A26 R68 Q44 R70 Resi- Q38 D28 dues

A rationally-designed meganuclease heterodimer (CF1/CF2) can be producedthat cleaves the sequence 5′-GAAAATATCATTGGTGTTTCCT-3′ (SEQ ID NO: 25)in the ΔF508 allele of the human CFTR gene. For example, a meganucleasewas designed based on the I-CreI meganuclease, as described above, withthe following contact residues and recognition sequence half-sites:

CF 1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base G A A A A T A T C Contact S32Y33 N30/ Q40 K28 Q26 H68/ Q44 R70 Resi- Q38 C24 dues

CF2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base A G G A A A C A C Contact N32R33 E30/ Q40 K28 A26 Y68/ T44 R70 Resi- R38 C24 dues

A rationally-designed meganuclease heterodimer (CCR1/CCR2) can beproduced that cleaves the sequence 5′-AACCCTCTCCAGTGAGATGCCT-3′ (SEQ IDNO: 26) in the human CCRS gene (an HIV co-receptor). For example, ameganuclease was designed based on the I-CreI meganuclease, as describedabove, with the following contact residues and recognition sequencehalf-sites:

CCR1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base A A C C C T C T C Contact N32Y33 R30/ E40/ E42 Q26 Y68/ Q44 R70 Resi- E38 R28 K24 dues

CCR2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base A G G C A T C T C Contact N32R33 E30/ E40 K28 Q26 Y68/ Q44 R70 Resi- R38 K24 dues

A rationally-designed meganuclease heterodimer (MYD1/MYD2) can beproduced that cleaves the sequence 5′-GACCTCGTCCTCCGACTCGCTG-3′ (SEQ IDNO: 27) in the 3′ untranslated region of the human DM kinase gene. Forexample, a meganuclease was designed based on the I-CreI meganuclease,as described above, with the following contact residues and recognitionsequence half-sites:

MYD1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base G A C C T C G T C Contact S32Y33 R30/ E40/ K66 Q26/ R68 Q44 R70 Resi- E38 R28 E77 dues

MYD1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C A G C G A G T C Contact S32Y33 E30/ E40/ R42 A26 R68 Q44 R70 Resi- R38 R28 Q77 dues2. Meganuclease Heterodimers which Cleave DNA Sequences in PathogenGenomes.

A rationally-designed meganuclease heterodimer (HSV1/HSV2) can beproduced that cleaves the sequence 5′-CTCGATGTCGGACGACACGGCA-3′ (SEQ IDNO: 28) in the UL36 gene of Herpes Simplex Virus-1 and Herpes SimplexVirus-2. For example, a meganuclease was designed based on the I-CreImeganuclease, as described above, with the following contact residuesand recognition sequence half-sites:

HSV1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C T C G A T G T C Contact S32C33 R30/ R40/ Q42/ Q26 R68 Q44 R70 Resi- E38 K28 dues

HSV2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T G C C G T G T C Contact C32R33 R30/ E40/ R42 Q26 R68 Q44 R70 Resi- E38 R28 dues

A rationally-designed meganuclease heterodimer (ANT1/ANT2) can beproduced that cleaves the sequence 5′-ACAAGTGTCTATGGACAGTTTA-3′ (SEQ IDNO: 29) in the Bacillus anthracis genome. For example, a meganucleasewas designed based on the I-CreI meganuclease, as described above, withthe following contact residues and recognition sequence half-sites:

ANT1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base A C A A G T G T C Contact N32C33 N30/ Q40/ R42 Q26 R68 Q44 R70 Resi- Q38 A28 dues

ANT2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T A A A C T G T C Contact C32Y33 N30/ Q40 E42 Q26 R68 Q44 R70 Resi- Q38 dues

A rationally-designed meganuclease heterodimer (POX1/POX2) can beproduced that cleaves the sequence 5′-AAAACTGTCAAATGACATCGCA-3′ (SEQ IDNO: 30) in the Variola (smallpox) virus gp009 gene. For example, ameganuclease was designed based on the I-CreI meganuclease, as describedabove, with the following contact residues and recognition sequencehalf-sites:

POX1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base A A A A C T G T C Contact N32C33 N30/ Q40 K28 Q26 R68 Q44 R70 Resi- Q38 dues

POX2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T G C G A T G T C Contact C32R33 R30/ R40 C28/ Q26 R68 Q44 R70 Resi- E38 Q42 dues

A rationally-designed meganuclease homodimer (EBB1/EBB1) can be producedthat cleaves the pseudo-palindromic sequence5′-CGGGGTCTCGTGCGAGGCCTCC-3′ (SEQ ID NO: 31) in the Epstein-Barr VirusBALF2 gene. For example, a meganuclease was designed based on the I-CreImeganuclease, as described above, with the following contact residuesand recognition sequence half-sites:

EBB1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C G G G G T C T C Contact S32R33 D30/ R40/ R42 Q26 Y68/ Q44 R70 Resi- Q38 D28 K24 dues

EBB1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base G G A G G C C T C Contact S32R33 D30/ R40/ R42 Q26 Y68/ Q44 R70 Resi- Q38 D28 K24 dues3. Meganuclease Heterodimers which Cleave DNA Sequences in PlantGenomes.

A rationally-designed meganuclease heterodimer (GLA1/GLA2) can beproduced that cleaves the sequence 5′-CACTAACTCGTATGAGTCGGTG-3′ (SEQ IDNO: 32) in the Arabidopsis thalianna GL2 gene. For example, ameganuclease was designed based on the I-CreI meganuclease, as describedabove, with the following contact residues and recognition sequencehalf-sites:

GLA1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C A C T A A C T C Contact S32Y33 R30/ S40/ K28 A26/ Y68/ Q44 R70 Resi- E38 C79 Q77 K24 dues

GLA2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C A C C G A C T C Contact S32Y33 R30/ E40/ R42 A26 Y68/ Q44 R70 Resi- E38 R28 Q77 K24 dues

A rationally-designed meganuclease heterodimer (BRP1/BRP2) can beproduced that cleaves the sequence 5′-TGCCTCCTCTAGAGACCCGGAG-3′ (SEQ IDNO: 33) in the Arabidopsis thalianna BP1 gene. For example, ameganuclease was designed based on the I-CreI meganuclease, as describedabove, with the following contact residues and recognition sequencehalf-sites:

BRP1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T G C C T C C T C Contact C32R33 R30/ R28/ K66 Q26/ Y68/ Q44 R70 Resi- E38 E40 E77 K24 dues

BRP2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C T C C G G G T C Contact S32C33 R30/ E40/ R42 S26 R68 Q44 R70 Resi- E38 R28 R77 dues

A rationally-designed meganuclease heterodimer (MGC1/MGC2) can beproduced that cleaves the sequence 5′-TAAAATCTCTAAGGTCTGTGCA-3′ (SEQ IDNO: 34) in the Nicotiana tabacum Magnesium Chelatase gene. For example,a meganuclease was designed based on the I-CreI meganuclease, asdescribed above, with the following contact residues and recognitionsequence half-sites:

MGC1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T A A A A T C T C Contact C32Y33 N30/ Q40/ K28 Q26 Y68/ Q44 R70 Resi- Q38 K24 dues

MGC2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T G C A C A G A C Contact S32R33 R30/ Q40 K28 A26 R68 T44 R70 Resi- E38 Q77 dues

A rationally-designed meganuclease heterodimer (CYP/HGH2) can beproduced that cleaves the sequence 5′-CAAGAATTCAAGCGAGCATTAA-3′ (SEQ IDNO: 35) in the Nicotiana tabacum CYP82E4 gene. For example, ameganuclease was designed based on the I-CreI meganuclease, as describedabove, with the following contact residues and recognition sequencehalf-sites:

CYP:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base C A A G A A T T C Contact D32Y33 N30/ R40/ K28 Q77/ Y68 Q44 R70 Resi- Q38 A26 dues

HGH2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T T A A T G C T C Contact S32C33 N30/ Q40 K66 R77/ Y68 Q44 R70 Resi- Q38 S26 K24 dues4. Meganuclease Heterodimers which Cleave DNA Sequences in YeastGenomes.

A rationally-designed meganuclease heterodimer (URA1/URA2) can beproduced that cleaves the sequence 5′-TTAGATGACAAGGGAGACGCAT-3′ (SEQ IDNO: 36) in the Saccharomyces cerevisiae URA3 gene. For example, ameganuclease was designed based on the I-CreI meganuclease, as describedabove, with the following contact residues and recognition sequencehalf-sites:

URA 1:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base T T A G A T G A C Contact S32C33 N30/ R40 K28 Q26 R68 T44 R70 Resi- Q38 dues

URA2:

Position -9 -8 -7 -6 -5 -4 -3 -2 -1 Base A T G C G T C T C Contact N32C33 E30/ E40/ R42 Q26 Y68/ Q44 R70 Resi- R38 R28 K24 dues

5. Recognition Sequence Specificity.

The rationally-designed meganucleases outlined above in this Examplewere cloned, expressed in E. coli, and purified as in Example 1. Eachpurified meganuclease was then mixed 1:1 with its correspondingheterodimerization partner (e.g., ACH1 with ACH2, HGH1 with HGH2, etc.)and incubated with a linearized DNA substrate containing the intendednon-palindromic DNA recognition sequence for each meganucleaseheterodimer. As shown in FIG. 3, each rationally-designed meganucleaseheterodimer cleaves its intended DNA site.

SEQUENCE LISTING SEQ ID NO: 1 (wild-type I-CreI, Genbank Accession #PO5725)   1MNTKYNKEFL LYLAGFVDGD GSIIAQIKPN QSYKFKHQLS LAFQVTQKTQ RRWFLDKLVD  61EIGVGYVRDR GSVSDYILSE IKPLHNFLTQ LQPFLKLKQK QANLVLKIIW RLPSAKESPD 121KFLEVCTWVD QIAALNDSKT RKTTSETVRA VLDSLSEKKK SSPSEQ ID NO: 2 (wild-type I-CreI recognition sequence)   1GAAACTGTCT CACGACGTTT TGSEQ ID NO: 3 (wild-type I-CreI recognition sequence)   1GAAAACGTCG TGAGACAGTT TCSEQ ID NO: 4 (wild-type I-CreI recognition sequence)   1CAAACTGTCG TGAGACAGTT TGSEQ ID NO: 5 (wild-type I-CreI recognition sequence)   1CAAACTGTCT CACGACAGTT TGSEQ ID NO: 6 (wild-type I-MsoI, Genbank Accession # AAL34387)   1MTTKNTLQPT EAAYIAGFLD GDGSIYAKLI PRPDYKDIKY QVSLAISFIQ RKDKFPYLQD  61IYDQLGKRGN LRKDRGDGIA DYTIIGSTHL SIILPDLVPY LRIKKKQANR ILHIINLYPQ 121AQKNPSKFLD LVKIVDDVQN LNKRADELKS TNYDRLLEEF LKAGKIESSPSEQ ID NO: 7 (wild-type I-MsoI, recognition sequence)   1CAGAACGTCG TGAGACAGTT CCSEQ ID NO: 8 (wild-type I-MsoI, recognition sequence)   1GGAACTGTCT CACGACGTTC TGSEQ ID NO: 9 (wild-type I-SceI, Genbank Accession # CAA09843)   1MKNIKKNQVM NLGPNSKLLK EYKSQLIELN IEQFEAGIGL ILGDAYIRSR DEGKTYCMQF  61EWKNKAYMDH VCLLYDQWVL SPPHKKERVN HLGNLVITWG AQTFKHQAFN KLANLFIVNN 121KKTIPNNLVE NYLTPMSLAY WFMDDGGKWD YNKNSTNKSI VLNTQSFTFE EVEYLVKGLR 181NKFQLNCYVK INKNKPIIYI DSMSYLIFYN LIKPYLIPQM MYKLPNTISS ETFLKSEQ ID NO: 10 (wild-type I-SceI, recognition sequence)   1TTACCCTGTT ATCCCTAGSEQ ID NO: 11 (wild-type I-SceI, recognition sequence)   1CTAGGGATAA CAGGGTAA SEQ ID NO: 12 (wild-type I-CeuI, Genbank Accession #P32761)   1MSNFILKPGE KLPQDKLEEL KKINDAVKKT KNFSKYLIDL RKLFQIDEVQ VTSESKLFLA  61GFLEGEASLN ISTKKLATSK FGLVVDPEFN VTQHVNGVKV LYLALEVFKT GRIRHKSGSN 121ATLVLTIDNR QSLEEKVIPF YEQYVVAFSS PEKVKRVANF KALLELFNND AHQDLEQLVN 181KILPIWDQMR KQQGQSNEGF PNLEAAQDFA RNYKKGIKSEQ ID NO: 13 (wild-type I-CeuI, recognition sequence)   1ATAACGGTCC TAAGGTAGCG AASEQ ID NO: 14 (wild-type I-CeuI, recognition sequence)   1TTCGCTACCT TAGGACCGTT ATSEQ ID NO: 15 (HIV-1 TAT gene, partial sequence)   1GAAGAGCTCA TCAGAACAGT CASEQ ID NO: 16 (rationally-designed TAT1 recognition sequence  half-site)  1 GAAGAGCTCSEQ ID NO: 17 (rationally-designed TAT2 recognition sequence  half-site)  1 TGACTGTTCSEQ ID NO: 18 (rationally-designed CCR1 recognition sequence  half-site)  1 AACCCTCTCSEQ ID NO: 19 (rationally-designed BRP2 recognition sequence  half-site)  1 CTCCGGGTCSEQ ID NO: 20 (rationally-designed LAM1 recognition sequence  half-site)  1 TGCGGTGTCSEQ ID NO: 21 (rationally-designed LAM2 recognition sequence  half-site)  1 CAGGCTGTCSEQ ID NO: 22 (LAM1/LAM2 recognition sequence in bacteriophage  λp05 gene)   1 TGCGGTGTCC GGCGACAGCC TGSEQ ID NO: 23 (potential recognition sequence in human FGFR3  gene)   1CTGGGAGTCT CAGGACAGCC TGSEQ ID NO: 24 (potential recognition sequence in human growth hormone promoter)   1 CCAGGTGTCT CTGGACTCCT CCSEQ ID NO: 25 (potential recognition sequence in human CFTR gene ΔF508 allele)   1 GAAAATATCA TTGGTGTTTC CTSEQ ID NO: 26 (potential recognition sequence in human CCR5  gene)   1AACCCTCTCC AGTGAGATGC CTSEQ ID NO: 27 (potential recognition sequence in human DM kinase gene 3′ UTR)   1 GACCTCGTCC TCCGACTCGC TGSEQ ID NO: 28 (potential recognition sequence in Herpes Simplex Virus-1 and Herpes Simplex Virus-2 UL36 gene)   1CTCGATGTCG GACGACACGG CASEQ ID NO: 29 (potential recognition sequence in Bacillus anthracis genome)   1 ACAAGTGTCT ATGGACAGTT TASEQ ID NO: 30 (potential recognition sequence in the Variola (smallpox) virus gp009 gene)   1 AAAACTGTCA AATGACATCG CASEQ ID NO: 31 (potential recognition sequence in the Epstein-Barr Virus BALF2 gene)   1 CGGGGTCTCG TGCGAGGCCT CCSEQ ID NO: 32 (potential recognition sequence in the Arabidopsis thalianna GL2 gene)   1 CACTAACTCG TATGAGTCGG TGSEQ ID NO: 33 (potential recognition sequence in the Arabidopsis thalianna BP1 gene)   1 TGCCTCCTCT AGAGACCCGG AGSEQ ID NO: 34 (potential recognition sequence in the Nicotiana tabacum Magnesium Chelatase gene)   1 TAAAATCTCT AAGGTCTGTG CASEQ ID NO: 35 (potential recognition sequence in the Nicotiana tabacum CYP82E4 gene)   1 CAAGAATTCA AGCGAGCATT AASEQ ID NO: 36 (potential recognition sequence in the Saccharomyces cerevisiae URA3 gene)   1 TTAGATGACA AGGGAGACGC AT

1. A recombinant meganuclease having altered specificity for at leastone recognition sequence half-site relative to a wild-type I-CreImeganuclease, comprising: a polypeptide having at least 85% sequencesimilarity to residues 2-153 of the I-CreI meganuclease of SEQ ID NO: 1;and having specificity for a recognition sequence half-site whichdiffers by at least one base pair from a half-site within an I-CreImeganuclease recognition sequence selected from the group consisting ofSEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4 and SEQ ID NO:5; wherein saidrecombinant meganuclease comprises at least one modification of Table 1which is not an excluded modification. 2-8. (canceled)
 9. A recombinantmeganuclease having altered binding affinity for double-stranded DNArelative to a wild-type I-CreI meganuclease, comprising: a polypeptidehaving at least 85% sequence similarity to residues 2-153 of the I-CreImeganuclease of SEQ ID NO: 1; wherein DNA-binding affinity has beenincreased by at least one modification corresponding to a substitutionselected from the group consisting of: (a) substitution of E80, D137,I81, L112, P29, V64 or Y66 with H, N, Q, S, T, K or R; or (b)substitution of T46, T140 or T143 with K or R^(.) or wherein DNA-bindingaffinity has been decreased by at least one modification correspondingto a substitution selected from the group consisting of: (a)substitution of K34, K48, R51, K82, K116 or K139 with H, N, Q, S, T, Dor E; or (b) substitution of 181, L112, P29, V64, Y66, T46, T140 or T143with D or E. 10-16. (canceled)
 17. A recombinant meganuclease monomerhaving altered affinity for dimer formation with a referencemeganuclease monomer, comprising: a polypeptide having at least 85%sequence similarity to residues 2-153 of the I-CreI meganuclease of SEQID NO: 1; wherein affinity for dimer formation has been altered by atleast one modification corresponding to a substitution selected from thegroup consisting of: (a) substitution of K7, K57 or K96 with D or E; or(b) substitution of E8 or E61 with K or R. 18-25. (canceled)
 26. Amethod for producing a genetically-modified eukaryotic cell including anexogenous sequence of interest inserted in a chromosome of saideukaryotic cell, comprising: transfecting a eukaryotic cell with one ormore nucleic acids including (i) a first nucleic acid sequence encodinga meganuclease, and (ii) a second nucleic acid sequence including saidsequence of interest; wherein said meganuclease produces a cleavage sitein said chromosome and said sequence of interest is inserted into saidchromosome at said cleavage site; and wherein said meganuclease is arecombinant meganuclease of claim
 1. 27-28. (canceled)
 29. A method forproducing a genetically-modified eukaryotic cell including an exogenoussequence of interest inserted in a chromosome of said eukaryotic cell,comprising: introducing a meganuclease protein into a eukaryotic cell;and transfecting said eukaryotic cell with a nucleic acid including saidsequence of interest; wherein said meganuclease produces a cleavage sitein said chromosome and said sequence of interest is inserted into saidchromosome at said cleavage site; and wherein said meganuclease is arecombinant meganuclease of claim
 1. 30-31. (canceled)
 32. A method forproducing a genetically-modified eukaryotic cell by disrupting a targetsequence in a chromosome of said eukaryotic cell, comprising:transfecting a eukaryotic cell with a nucleic acid encoding ameganuclease; wherein said meganuclease produces a cleavage site in saidchromosome and said target sequence is disrupted by non-homologousend-joining at said cleavage site; and wherein said meganuclease is arecombinant meganuclease of claim
 1. 33-34. (canceled)
 35. A method fortreating a disease by gene therapy in a eukaryote, comprising:transfecting at least one cell of said eukaryote with one or morenucleic acids including (i) a first nucleic acid sequence encoding ameganuclease, and (ii) a second nucleic acid sequence including asequence of interest; wherein said meganuclease produces a cleavage sitein said chromosome and said sequence of interest is inserted into saidchromosome at said cleavage site; wherein said meganuclease is arecombinant meganuclease of claim 1; and wherein insertion of saidsequence of interest provides said gene therapy for said disease. 36-37.(canceled)
 38. A method for treating a disease by gene therapy in aeukaryote, comprising: introducing a meganuclease protein into at leastone cell of said eukaryote; and transfecting said eukaryotic cell with anucleic acid including a sequence of interest; wherein said meganucleaseproduces a cleavage site in said chromosome and said sequence ofinterest is inserted into said chromosome at said cleavage site; whereinsaid meganuclease is a recombinant meganuclease of claim 1; and whereininsertion of said sequence of interest provides said gene therapy forsaid disease. 39-40. (canceled)
 41. A method for treating a disease bygene therapy in a eukaryote by disrupting a target sequence in achromosome of said eukaryotic cell, comprising: transfecting at leastone cell of said eukaryote with a nucleic acid encoding a meganuclease;wherein said meganuclease produces a cleavage site in said chromosomeand said target sequence is disrupted by non-homologous end-joining atsaid cleavage site; wherein said meganuclease is a recombinantmeganuclease of claim 1; and wherein disruption of said target sequenceprovides said gene therapy for said disease. 42-48. (canceled)